Math · Statistics & Probability · Grade 9-12 · 5 min read

Coefficient of Determination

⚡ In one breath

The coefficient of determination r2r^2 is the proportion of total variation in yy explained by the linear relationship with xx, equal to the square of the correlation rr.

📐 The formula

r2=1SSresidualSStotal=1(yiy^i)2(yiyˉ)2r^2 = 1 - \frac{\text{SS}_{\text{residual}}}{\text{SS}_{\text{total}}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}

Orient

The one-line idea, why it matters, and the intuition.

Section 1

Quick Answer

The coefficient of determination r2r^2 is the proportion of total variation in yy explained by the linear relationship with xx, equal to the square of the correlation rr. Use it to report how well the regression line accounts for the spread in yy. The cue is the phrase 'percent of variation explained' — a number from 0 to 1, never negative, never a slope. Before calculating, ask: Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign?

Section 2

Why This Matters

r2r^2 is the standard one-number report card for a regression's predictive usefulness, and squaring rr exposes how much weaker a 'decent' correlation really is (r=0.7r=0.7 explains only 49%). Mixing it up with rr or with causation is what leads people to overstate how much a model actually tells them. Recognizing it by "Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign?" — rather than by familiar numbers — is what lets a student tell it apart from correlation rr and slope bb and residual variation in a mixed problem set.

Section 3

Intuitive Explanation

A bar showing the total spread of yy split into two pieces: the part the line explains and the leftover residual part — r2r^2 is the fraction of the bar that's the explained piece. This is the clean version of the idea because the visible structure matches the concept before any formula or procedure is chosen.

Treating r2r^2 as proof that xx causes yy, or confusing it with the slope — r2r^2 only says how much variation is explained, with no direction and no causal claim. That contrast matters because many wrong answers come from recognizing a surface feature, such as a familiar number or word, instead of the actual task.

A useful way to slow down is to name the signal words and then test them. Words like **percent of variation explained**, **proportion explained**, **square of correlation**, **goodness of fit**, **between 0 and 1** are helpful clues, but they are not enough by themselves. They must point to the same structure as the mental model: r2r^2 is the proportion of variation in yy accounted for by the linear relationship with xx — the square of the correlation.

The recognition test is simple: Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign? If yes, coefficient of determination is probably the right tool; if not, compare with Correlation rr or Slope bb or Residual variation before calculating.

Core idea

r2r^2 is the proportion of variation in yy accounted for by the linear relationship with xx — the square of the correlation.

Recognize

The cues that signal this concept and how to distinguish it from look-alikes.

Section 4

When to Use

Use Coefficient of Determination when you need to report what proportion of the variation in yy a linear model explains. Strong signals include **percent of variation explained**, **proportion explained**, **square of correlation**, **goodness of fit**, **between 0 and 1**. The safest workflow is to read the final question first, identify what kind of answer it wants, and then test the structure. Do not use coefficient of determination just because familiar numbers appear; first decide whether the situation answers "Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign?" with yes.

✨ Pro tip

Ask: Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign?

Section 5

How to Recognize It

Before using Coefficient of Determination, check the structure of the problem, not just the vocabulary. These questions force the same recognition move from several angles: the task, the signal words, the nearest confusion, and the thing that would make the concept fail.

  1. Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign?

    If yes, the problem matches coefficient of determination. If no, pause before applying the procedure, because the same numbers may belong to a different idea.

  2. Which words signal the structure?

    Look for percent of variation explained, proportion explained, square of correlation, goodness of fit. These words are useful only after the situation matches them; a keyword without structure is not proof.

  3. What is the nearest confusion?

    Correlation rr is the common trap here: Carries the sign and direction of association, ranging 1-1 to 11; r2r^2 drops the sign and squares it. Compare the desired final answer before choosing a method.

  4. What answer form should I expect?

    The answer should fit this mental model: r2r^2 is the proportion of variation in yy accounted for by the linear relationship with xx — the square of the correlation. If the expected answer sounds more like correlation rr, use the comparison table before solving.

  5. What would make this NOT Coefficient of Determination?

    Treating r2r^2 as proof that xx causes yy, or confusing it with the slope — r2r^2 only says how much variation is explained, with no direction and no causal claim. This tells you when to switch tools instead of forcing the concept.

Section 6

Coefficient of Determination vs Common Confusions

The hard part is recognizing when the task is really about coefficient of determination instead of a nearby idea. Read the final answer the problem wants, then ask which row describes the structure before you start calculating.

Coefficient of Determination

Meaning
Use this when you need to report what proportion of the variation in yy a linear model explains. The deciding question is: Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign?
Key test
Am I reporting the fraction of $y$'s variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign?
Formula
r2=1SSresidualSStotal=1(yiy^i)2(yiyˉ)2r^2 = 1 - \frac{\text{SS}_{\text{residual}}}{\text{SS}_{\text{total}}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}
Example
A regression of weight on height gives correlation r=0.75r=0.75. Find and interpret r2r^2.

Correlation $r$

Meaning
Carries the sign and direction of association, ranging 1-1 to 11; r2r^2 drops the sign and squares it.
Key test
Use $r$ when you need direction (positive/negative) of the relationship.
Formula
r2=(r)2r^2=(r)^2
Example
r=0.9r=-0.9 vs r2=0.81r^2=0.81

Slope $b$

Meaning
The rate yy changes per unit xx, carrying units; r2r^2 is a unitless fraction of variation.
Key test
Use the slope when predicting the change in $y$ per unit $x$.
Formula
b=rsysxb=r\frac{s_y}{s_x}
Example
0.6 kg per cm

Residual variation

Meaning
The unexplained leftover, equal to 1r21-r^2 of the total.
Key test
Use when describing what the model fails to capture.
Formula
1r21-r^2
Example
15% unexplained when r2=0.85r^2=0.85

Apply

Worked examples and the mistakes most students make.

Section 7

Formula & Notation

r2=1SSresidualSStotal=1(yiy^i)2(yiyˉ)2r^2 = 1 - \frac{\text{SS}_{\text{residual}}}{\text{SS}_{\text{total}}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}
r2=1SSresSStot=1(yiy^i)2(yiyˉ)2r^2 = 1 - \frac{\text{SS}_{\text{res}}}{\text{SS}_{\text{tot}}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2} where 0r210 \leq r^2 \leq 1

How to read it: r2r^2 ranges from 0 to 1. SStotal\text{SS}_{\text{total}} = total sum of squares. SSresidual\text{SS}_{\text{residual}} = residual sum of squares.

Section 8

Worked Examples

Example 1 — Interpreting a fit

Easy

Problem

A regression of weight on height gives correlation r=0.75r=0.75. Find and interpret r2r^2.

Solution

  1. We want the proportion of variation in weight explained by height — square the correlation.

    Name the structure before touching arithmetic — that is what makes the right method obvious.

  2. Ask the recognition question: Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign?

    If the answer is yes, the concept applies; the cue, not a keyword, decides the method.

  3. Compute r2=(0.75)2=0.5625r^2=(0.75)^2=0.5625.

    The rule is chosen only after the structure matches, so the steps mean something.

  4. About 56% of the variation in weight is explained by the linear relationship with height; 44% is unexplained.

    Keep units, shape, or answer form tied to the story so the work does not become symbol pushing.

  5. Check the answer against the original question.

    It should fit the mental model — the fraction of the wiggle the line explains. If it does not, revisit the recognition step before changing the arithmetic.

Answer

r20.56r^2\approx 0.56

Takeaway: Square the correlation to get the fraction of yy's variation the line explains.

Example 2 — Asking for direction

Standard

Problem

You're told r2=0.81r^2=0.81 and asked whether the relationship is positive or negative. Can r2r^2 answer that?

Solution

  1. Notice why this looks like the same concept.

    Nearby language or numbers can tempt you toward the fraction of the wiggle the line explains.

  2. Squaring erased the sign — r2=0.81r^2=0.81 could come from r=+0.9r=+0.9 or r=0.9r=-0.9.

    Spotting what actually changed is what separates this from the concept it resembles.

  3. Go back to rr (or the slope's sign) to get direction; r2r^2 alone can't.

    The nearby idea may share numbers but answers a different question, so it needs a different move.

  4. State the result in the language of the actual task.

    No — r2r^2 can't give direction. Name it for what the problem really asked, not the concept you first expected.

  5. Say the contrast in one sentence.

    r2r^2 measures strength of explained variation only; the sign lives in rr or the slope.

Answer

No — r2r^2 can't give direction

Takeaway: r2r^2 measures strength of explained variation only; the sign lives in rr or the slope.

Example 3 — Spot the trap: The fraction of the wiggle the line explains

Application

Problem

A student starts with this idea: "Reporting rr when the question asks for r2r^2" What should they check before accepting that reasoning?

Solution

  1. Pause before the first move.

    The first move is a decision, not a calculation — does the situation really match the fraction of the wiggle the line explains.

  2. Run the recognition test: Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign?

    This is the single check that the trap skips.

  3. square the correlation; r=0.7r=0.7 gives r2=0.49r^2=0.49, not 0.7.

    Stating the safer rule turns the mistake into a checkable step instead of a vague "be careful."

  4. Compare with the nearest confusion, Correlation rr.

    Carries the sign and direction of association, ranging 1-1 to 11; r2r^2 drops the sign and squares it.

  5. State the corrected decision and reuse it.

    Using the concept only when the structure matches leaves a process the student can repeat on a new problem.

Answer

square the correlation; r=0.7r=0.7 gives r2=0.49r^2=0.49, not 0.7.

Takeaway: The recognition step prevents the common trap: Reporting rr when the question asks for r2r^2

Section 9

Common Mistakes

Common slip-up

Reporting rr when the question asks for r2r^2

The right idea

square the correlation; r=0.7r=0.7 gives r2=0.49r^2=0.49, not 0.7.

Common slip-up

Reading r2r^2 as causation

The right idea

it measures explained variation, never that xx causes yy.

Common slip-up

Letting r2r^2 go negative or above 1

The right idea

it's a proportion between 0 and 1, so any value outside that range is an error.

Practice

Try it, then see where this concept fits in the path.

Section 10

Mini Practice

Try these on your own. Tap Reveal when you want to check.

  1. What clue tells you this is a Coefficient of Determination situation: A regression of weight on height gives correlation r=0.75r=0.75. Find and interpret r2r^2.

    Hint: Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign?

  2. A regression of weight on height gives correlation r=0.75r=0.75. Find and interpret r2r^2.

    Hint: Compute r2=(0.75)2=0.5625r^2=(0.75)^2=0.5625.

  3. Why is this a contrast case instead of Coefficient of Determination: You're told r2=0.81r^2=0.81 and asked whether the relationship is positive or negative. Can r2r^2 answer that?

    Hint: Squaring erased the sign — r2=0.81r^2=0.81 could come from r=+0.9r=+0.9 or r=0.9r=-0.9.

  4. Fix this thinking: Reporting rr when the question asks for r2r^2

    Hint: Name the recognition cue before choosing a rule.

  5. Which is the better fit here: Coefficient of Determination or Correlation rr? Explain the deciding difference.

    Hint: For Coefficient of Determination, ask: Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign?

  6. Write one sentence that would remind a classmate how to recognize Coefficient of Determination.

    Hint: Use the mental model "The fraction of the wiggle the line explains." and one signal word.

Want the full set?

50 practice questions for this concept — free to try, every one with a complete worked solution showing the why, not just the answer.

Section 11

Frequently Asked Questions

How do I know when to use Coefficient of Determination?

Use Coefficient of Determination when you need to report what proportion of the variation in yy a linear model explains. Do not start from the numbers alone; first name the structure of the situation. The fastest check is: Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign? If the answer is yes and the wording matches cues like percent of variation explained, proportion explained, square of correlation, then coefficient of determination is probably the right tool.

What is Coefficient of Determination most often confused with?

Coefficient of Determination is often confused with Correlation rr. Correlation rr means Carries the sign and direction of association, ranging 1-1 to 11; r2r^2 drops the sign and squares it. The difference is not just vocabulary; it changes the action you take. For coefficient of determination, the key test is "Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign?" For correlation rr, the better cue is: Use rr when you need direction (positive/negative) of the relationship.

What is the fastest recognition cue for Coefficient of Determination?

Look for percent of variation explained, proportion explained, square of correlation, goodness of fit, but treat those words as clues, not proof. A word problem can contain a familiar keyword and still ask for a different idea. After noticing the cue, ask the recognition question: Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign? That question protects you from using a memorized procedure in the wrong place.

What mistake should I avoid with Coefficient of Determination?

Avoid this thinking: "Reporting rr when the question asks for r2r^2" That mistake usually happens when the student jumps to a rule before checking the situation. The safer version is: square the correlation; r=0.7r=0.7 gives r2=0.49r^2=0.49, not 0.7. A good habit is to say the mental model out loud first: "The fraction of the wiggle the line explains." Then choose the calculation or representation.

How can I tell this apart from Slope bb?

Slope bb is the better fit when the task is about this: The rate yy changes per unit xx, carrying units; r2r^2 is a unitless fraction of variation. Coefficient of Determination is the better fit when you need to report what proportion of the variation in yy a linear model explains. If both ideas seem possible, compare what the problem wants as the final answer. The desired output often reveals whether you should use coefficient of determination or switch to the nearby concept.

Why does Coefficient of Determination matter?

r2r^2 is the standard one-number report card for a regression's predictive usefulness, and squaring rr exposes how much weaker a 'decent' correlation really is (r=0.7r=0.7 explains only 49%). Mixing it up with rr or with causation is what leads people to overstate how much a model actually tells them. The practical value is recognition: once you can spot coefficient of determination, you can choose a method before calculating. That makes later topics easier because you are not memorizing isolated tricks; you are recognizing the same structure when it appears in a new representation.

Section 12

Learning Path

Coefficient of Determination

You are here

Before this, students should be comfortable with Correlation and Least Squares Regression Line. This page focuses on the recognition cue: Am I reporting the fraction of $y$'s variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign? That cue is the bridge between earlier skills and later problem solving: students first learn to identify the structure, then they learn which calculation, diagram, graph, or proof move belongs to it. After this, Inference for Regression become easier to recognize.

Section 13

See Also