Correlation Coefficient Formula

The correlation coefficient (Pearson's r) is a number between −1 and 1 that measures both the strength and direction of the linear relationship between.

The Formula

r=(xixˉ)(yiyˉ)(xixˉ)2(yiyˉ)2r = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum(x_i-\bar{x})^2 \sum(y_i-\bar{y})^2}}

When to use: r = 1 means perfect positive line, r = −1 means perfect negative line, r = 0 means no linear pattern.

Quick Example

Height and weight: r ≈ 0.7, a moderate positive correlation — taller people tend to weigh more.

What This Formula Means

The correlation coefficient (Pearson's r) is a number between −1 and 1 that measures both the strength and direction of the linear relationship between two quantitative variables. A value of 1 indicates a perfect positive linear relationship, −1 a perfect negative linear relationship, and 0 no linear relationship at all.

r = 1 means perfect positive line, r = −1 means perfect negative line, r = 0 means no linear pattern.

Formal View

For paired observations (xi,yi)(x_i, y_i), Pearson's correlation coefficient is r=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2i=1n(yiyˉ)2r = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2 \cdot \sum_{i=1}^{n}(y_i - \bar{y})^2}}, where r[1,1]r \in [-1, 1].

Worked Examples

Example 1

medium
Given r=0.6r=0.6, compute R2R^2 and interpret it.

Answer

R2=0.36, 36% of variance explainedR^2=0.36,\text{ 36\% of variance explained}

First step

1
R2=r2=0.36R^2 = r^2 = 0.36.

See the full worked solution + why-it-works coaching

SetupKey insightWhy it worksCommon pitfallConnection

Unlock answer keys One Family plan — every worked solution, all subjects

Example 2

medium
Five points: (1,2),(2,4),(3,6),(4,8),(5,10)(1,2), (2,4), (3,6), (4,8), (5,10). Compute rr without a formula.

Example 3

medium
Given R2=0.49R^2 = 0.49 and a positive scatterplot slope, find rr.

Common Mistakes

  • Assuming r measures nonlinear relationships - The safer move is to ask "Am I studying a relationship between variables, and have I separated association from causation?" and then state the data source, denominator, or variable before interpreting the result.
  • Confusing correlation with causation - The safer move is to ask "Am I studying a relationship between variables, and have I separated association from causation?" and then state the data source, denominator, or variable before interpreting the result.
  • Ignoring outliers that inflate or deflate r - The safer move is to ask "Am I studying a relationship between variables, and have I separated association from causation?" and then state the data source, denominator, or variable before interpreting the result.
  • Choosing correlation coefficient from a keyword alone - Keywords like relationship, association, predict are only clues; the data structure must match the concept.

Why This Formula Matters

Correlation Coefficient gives students a careful language for comparing variables without jumping to a causal story. It is useful for reading scatter plots, two-way tables, regression models, and real-world claims where patterns are tempting but hidden variables may matter.

Frequently Asked Questions

What is the Correlation Coefficient formula?

The correlation coefficient (Pearson's r) is a number between −1 and 1 that measures both the strength and direction of the linear relationship between two quantitative variables. A value of 1 indicates a perfect positive linear relationship, −1 a perfect negative linear relationship, and 0 no linear relationship at all.

How do you use the Correlation Coefficient formula?

r = 1 means perfect positive line, r = −1 means perfect negative line, r = 0 means no linear pattern.

Why is the Correlation Coefficient formula important in Statistics?

Correlation Coefficient gives students a careful language for comparing variables without jumping to a causal story. It is useful for reading scatter plots, two-way tables, regression models, and real-world claims where patterns are tempting but hidden variables may matter.

What do students get wrong about Correlation Coefficient?

Students often know a procedure related to correlation coefficient but skip the recognition step: Am I studying a relationship between variables, and have I separated association from causation? That leads to a calculation or graph that looks reasonable but answers a different question.

What should I learn before the Correlation Coefficient formula?

Before studying the Correlation Coefficient formula, you should understand: correlation intro, line of best fit.