Chi-Square Test Formula

The Formula

\chi^2 = \sum \frac{(\text{Observed} - \text{Expected})^2}{\text{Expected}}

When to use: You expect a die to land on each face about \frac{1}{6} of the time. You roll it 600 times and compare what you observed to what you expected. If the differences are small, the die is probably fair. If they're large, something is off. The chi-square statistic measures 'how far off are the observed counts from what we expected?'

Quick Example

Roll a die 60 times. Expected: 10 per face. Observed: 8, 12, 11, 7, 13, 9. \chi^2 = \frac{(8-10)^2}{10} + \frac{(12-10)^2}{10} + \cdots + \frac{(9-10)^2}{10} = 2.8 Compare to \chi^2 critical value with df = 5. Not significant—die appears fair.

Notation

\chi^2 is the test statistic. Degrees of freedom: goodness-of-fit df = k - 1; independence/homogeneity df = (r-1)(c-1).

What This Formula Means

A family of hypothesis tests that use the chi-square statistic to compare observed frequencies to expected frequencies. The three main types are: goodness-of-fit (does data match a claimed distribution?), test of independence (are two categorical variables related?), and test of homogeneity (do different populations have the same distribution?).

You expect a die to land on each face about \frac{1}{6} of the time. You roll it 600 times and compare what you observed to what you expected. If the differences are small, the die is probably fair. If they're large, something is off. The chi-square statistic measures 'how far off are the observed counts from what we expected?'

Formal View

\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i} where O_i is observed count and E_i is expected count; df = k - 1 (GOF) or (r-1)(c-1) (independence)

Worked Examples

Example 1

medium
A die is rolled 60 times. Observed: 1→8, 2→12, 3→9, 4→11, 5→13, 6→7. Conduct a chi-square goodness-of-fit test at \alpha=0.05.

Solution

  1. 1
    Expected under H_0 (fair die): E = 60/6 = 10 for each outcome
  2. 2
    \chi^2 = \sum \frac{(O-E)^2}{E} = \frac{(8-10)^2}{10} + \frac{(12-10)^2}{10} + \frac{(9-10)^2}{10} + \frac{(11-10)^2}{10} + \frac{(13-10)^2}{10} + \frac{(7-10)^2}{10}
  3. 3
    = \frac{4+4+1+1+9+9}{10} = \frac{28}{10} = 2.8
  4. 4
    df = 6-1 = 5; critical value \chi^2_{0.05,5} = 11.07; since 2.8 < 11.07, fail to reject H_0

Answer

\chi^2 = 2.8 < 11.07. Fail to reject H_0. No evidence the die is unfair.
The chi-square goodness-of-fit test compares observed frequencies to expected frequencies under a null model. Large \chi^2 values (in the critical region) indicate the observed distribution differs from expected. Degrees of freedom = (categories - 1).

Example 2

hard
A 2×2 table: Men: 30 prefer A, 20 prefer B. Women: 15 prefer A, 35 prefer B. Test independence of gender and preference at \alpha=0.05.

Common Mistakes

  • Using chi-square on numerical (continuous) data instead of categorical (count) data.
  • Forgetting to check that all expected counts are at least 5—small expected counts make the chi-square approximation unreliable.
  • Confusing independence and homogeneity tests—they use the same formula but ask different questions and arise from different study designs.

Why This Formula Matters

Essential for analyzing categorical data: survey responses, genetics ratios, market research categories, and any situation where you compare proportions across groups.

Frequently Asked Questions

What is the Chi-Square Test formula?

A family of hypothesis tests that use the chi-square statistic to compare observed frequencies to expected frequencies. The three main types are: goodness-of-fit (does data match a claimed distribution?), test of independence (are two categorical variables related?), and test of homogeneity (do different populations have the same distribution?).

How do you use the Chi-Square Test formula?

You expect a die to land on each face about \frac{1}{6} of the time. You roll it 600 times and compare what you observed to what you expected. If the differences are small, the die is probably fair. If they're large, something is off. The chi-square statistic measures 'how far off are the observed counts from what we expected?'

What do the symbols mean in the Chi-Square Test formula?

\chi^2 is the test statistic. Degrees of freedom: goodness-of-fit df = k - 1; independence/homogeneity df = (r-1)(c-1).

Why is the Chi-Square Test formula important in Math?

Essential for analyzing categorical data: survey responses, genetics ratios, market research categories, and any situation where you compare proportions across groups.

What do students get wrong about Chi-Square Test?

Students struggle to distinguish the three types: goodness-of-fit tests one variable against a claimed distribution, independence tests the relationship between two variables in one sample, homogeneity tests the same variable across multiple populations.

What should I learn before the Chi-Square Test formula?

Before studying the Chi-Square Test formula, you should understand: hypothesis testing, p value, probability.