Math · Statistics & Probability · Grade 9-12 · 5 min read

Chi-Square Test

⚡ In one breath

A chi-square test compares observed category counts to expected counts using χ2=(OE)2E\chi^2=\sum\frac{(O-E)^2}{E} to test goodness of fit or independence.

📐 The formula

χ2=(ObservedExpected)2Expected\chi^2 = \sum \frac{(\text{Observed} - \text{Expected})^2}{\text{Expected}}

Orient

The one-line idea, why it matters, and the intuition.

Section 1

Quick Answer

A chi-square test compares observed category counts to expected counts using χ2=(OE)2E\chi^2=\sum\frac{(O-E)^2}{E} to test goodness of fit or independence. Use it when your data are counts in categories and you ask whether they match a claimed distribution or whether two categorical variables are independent. The cue is categorical frequencies (not means or proportions on a continuous scale). Before calculating, ask: Are the data counts of cases falling into categories, being compared to expected counts (rather than means or continuous values)?

Section 2

Why This Matters

It's the standard test for categorical data — the place a mean-based t-test simply doesn't apply because there's nothing to average. Knowing that big squared deviations relative to expected counts drive a large χ2\chi^2 is what connects 'the counts look off' to a real, p-value-backed conclusion about fairness or association. Recognizing it by "Are the data counts of cases falling into categories, being compared to expected counts (rather than means or continuous values)?" — rather than by familiar numbers — is what lets a student tell it apart from two-sample t-test and two-proportion z-test and correlation / lsrl in a mixed problem set.

Section 3

Intuitive Explanation

A tally sheet from rolling a die 600 times: each face 'should' show ~100 times; you line up the 6 observed counts against the expected 100s and add up how far off each one is, squared and scaled. This is the clean version of the idea because the visible structure matches the concept before any formula or procedure is chosen.

Running a chi-square on raw measurements or percentages — it needs actual counts (frequencies) in categories, not means, ratios, or already-converted proportions. That contrast matters because many wrong answers come from recognizing a surface feature, such as a familiar number or word, instead of the actual task.

A useful way to slow down is to name the signal words and then test them. Words like **observed vs expected**, **goodness of fit**, **test of independence**, **category counts**, **frequency table** are helpful clues, but they are not enough by themselves. They must point to the same structure as the mental model: The chi-square test sums squared gaps between observed and expected category counts to test independence or goodness of fit.

The recognition test is simple: Are the data counts of cases falling into categories, being compared to expected counts (rather than means or continuous values)? If yes, chi-square test is probably the right tool; if not, compare with Two-sample t-test or Two-proportion z-test or Correlation / LSRL before calculating.

Core idea

The chi-square test sums squared gaps between observed and expected category counts to test independence or goodness of fit.

Recognize

The cues that signal this concept and how to distinguish it from look-alikes.

Section 4

When to Use

Use Chi-Square Test when your data are counts in categories and you test whether they fit an expected distribution or whether two categorical variables are independent. Strong signals include **observed vs expected**, **goodness of fit**, **test of independence**, **category counts**, **frequency table**. The safest workflow is to read the final question first, identify what kind of answer it wants, and then test the structure. Do not use chi-square test just because familiar numbers appear; first decide whether the situation answers "Are the data counts of cases falling into categories, being compared to expected counts (rather than means or continuous values)?" with yes.

✨ Pro tip

Ask: Are the data counts of cases falling into categories, being compared to expected counts (rather than means or continuous values)?

Section 5

How to Recognize It

Before using Chi-Square Test, check the structure of the problem, not just the vocabulary. These questions force the same recognition move from several angles: the task, the signal words, the nearest confusion, and the thing that would make the concept fail.

  1. Are the data counts of cases falling into categories, being compared to expected counts (rather than means or continuous values)?

    If yes, the problem matches chi-square test. If no, pause before applying the procedure, because the same numbers may belong to a different idea.

  2. Which words signal the structure?

    Look for observed vs expected, goodness of fit, test of independence, category counts. These words are useful only after the situation matches them; a keyword without structure is not proof.

  3. What is the nearest confusion?

    Two-sample t-test is the common trap here: Compares MEANS of a numeric variable across two groups, not category counts. Compare the desired final answer before choosing a method.

  4. What answer form should I expect?

    The answer should fit this mental model: The chi-square test sums squared gaps between observed and expected category counts to test independence or goodness of fit. If the expected answer sounds more like two-sample t-test, use the comparison table before solving.

  5. What would make this NOT Chi-Square Test?

    Running a chi-square on raw measurements or percentages — it needs actual counts (frequencies) in categories, not means, ratios, or already-converted proportions. This tells you when to switch tools instead of forcing the concept.

Section 6

Chi-Square Test vs Common Confusions

The hard part is recognizing when the task is really about chi-square test instead of a nearby idea. Read the final answer the problem wants, then ask which row describes the structure before you start calculating.

Chi-Square Test

Meaning
Use this when your data are counts in categories and you test whether they fit an expected distribution or whether two categorical variables are independent. The deciding question is: Are the data counts of cases falling into categories, being compared to expected counts (rather than means or continuous values)?
Key test
Are the data counts of cases falling into categories, being compared to expected counts (rather than means or continuous values)?
Formula
χ2=(ObservedExpected)2Expected\chi^2 = \sum \frac{(\text{Observed} - \text{Expected})^2}{\text{Expected}}
Example
A die is rolled 600 times; face counts are 90, 110, 105, 95, 100, 100. Test goodness of fit to a fair die at α=0.05\alpha=0.05.

Two-sample t-test

Meaning
Compares MEANS of a numeric variable across two groups, not category counts.
Key test
Use when the response is numeric and you compare averages.
Formula
t=xˉ1xˉ2SEt=\frac{\bar{x}_1-\bar{x}_2}{\text{SE}}
Example
Comparing mean test scores of two classes

Two-proportion z-test

Meaning
Compares two proportions directly; chi-square of a 2×2 table is equivalent but generalizes to bigger tables.
Key test
Use for exactly two groups and one yes/no outcome.
Formula
z=p^1p^2SEz=\frac{\hat{p}_1-\hat{p}_2}{\text{SE}}
Example
Comparing vaccination rates in two towns

Correlation / LSRL

Meaning
Measures linear association between two NUMERIC variables, not categorical association.
Key test
Use when both variables are quantitative.
Formula
r=r=\ldots
Example
Height vs weight relationship

Apply

Worked examples and the mistakes most students make.

Section 7

Formula & Notation

χ2=(ObservedExpected)2Expected\chi^2 = \sum \frac{(\text{Observed} - \text{Expected})^2}{\text{Expected}}
χ2=i=1k(OiEi)2Ei\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i} where OiO_i is observed count and EiE_i is expected count; df=k1df = k - 1 (GOF) or (r1)(c1)(r-1)(c-1) (independence)

How to read it: χ2\chi^2 is the test statistic. Degrees of freedom: goodness-of-fit df=k1df = k - 1; independence/homogeneity df=(r1)(c1)df = (r-1)(c-1).

Section 8

Worked Examples

Example 1 — Is the die fair?

Easy

Problem

A die is rolled 600 times; face counts are 90, 110, 105, 95, 100, 100. Test goodness of fit to a fair die at α=0.05\alpha=0.05.

Solution

  1. Data are counts in 6 categories compared to an expected fair distribution — a chi-square goodness-of-fit test.

    Name the structure before touching arithmetic — that is what makes the right method obvious.

  2. Ask the recognition question: Are the data counts of cases falling into categories, being compared to expected counts (rather than means or continuous values)?

    If the answer is yes, the concept applies; the cue, not a keyword, decides the method.

  3. Expected each face =6006=100=\frac{600}{6}=100; compute (OE)2E\sum\frac{(O-E)^2}{E}.

    The rule is chosen only after the structure matches, so the steps mean something.

  4. (90100)2+(110100)2+(105100)2+(95100)2+0+0100=100+100+25+25100=2.5\frac{(90-100)^2+(110-100)^2+(105-100)^2+(95-100)^2+0+0}{100}=\frac{100+100+25+25}{100}=2.5, with df=5df=5.

    Keep units, shape, or answer form tied to the story so the work does not become symbol pushing.

  5. Check the answer against the original question.

    It should fit the mental model — how far are the counts from what we expected. If it does not, revisit the recognition step before changing the arithmetic.

Answer

χ2=2.5\chi^2=2.5, p-value large, fail to reject — die looks fair

Takeaway: Sum the scaled squared gaps; a small χ2\chi^2 means observed counts are close to expected.

Example 2 — Comparing two class averages

Standard

Problem

Instead you have the numeric test scores of two classes and want to know if their mean scores differ. Chi-square?

Solution

  1. Notice why this looks like the same concept.

    Nearby language or numbers can tempt you toward how far are the counts from what we expected.

  2. The response is numeric and you're comparing means, not counts in categories.

    Spotting what actually changed is what separates this from the concept it resembles.

  3. Use a two-sample t-test on the means, not a chi-square on counts.

    The nearby idea may share numbers but answers a different question, so it needs a different move.

  4. State the result in the language of the actual task.

    No — use a two-sample t-test. Name it for what the problem really asked, not the concept you first expected.

  5. Say the contrast in one sentence.

    Chi-square is for category counts; comparing numeric means calls for a t-test.

Answer

No — use a two-sample t-test

Takeaway: Chi-square is for category counts; comparing numeric means calls for a t-test.

Example 3 — Spot the trap: How far are the counts from what we expected

Application

Problem

A student starts with this idea: "Dividing by Observed instead of Expected" What should they check before accepting that reasoning?

Solution

  1. Pause before the first move.

    The first move is a decision, not a calculation — does the situation really match how far are the counts from what we expected.

  2. Run the recognition test: Are the data counts of cases falling into categories, being compared to expected counts (rather than means or continuous values)?

    This is the single check that the trap skips.

  3. the formula is (OE)2E\frac{(O-E)^2}{E}, scaling each gap by the expected count.

    Stating the safer rule turns the mistake into a checkable step instead of a vague "be careful."

  4. Compare with the nearest confusion, Two-sample t-test.

    Compares MEANS of a numeric variable across two groups, not category counts.

  5. State the corrected decision and reuse it.

    Using the concept only when the structure matches leaves a process the student can repeat on a new problem.

Answer

the formula is (OE)2E\frac{(O-E)^2}{E}, scaling each gap by the expected count.

Takeaway: The recognition step prevents the common trap: Dividing by Observed instead of Expected

Section 9

Common Mistakes

Common slip-up

Dividing by Observed instead of Expected

The right idea

the formula is (OE)2E\frac{(O-E)^2}{E}, scaling each gap by the expected count.

Common slip-up

Using percentages or means as the data

The right idea

chi-square requires raw category counts, not proportions or averages.

Common slip-up

Using the wrong degrees of freedom

The right idea

goodness-of-fit uses df=k1df=k-1; a two-way independence test uses df=(r1)(c1)df=(r-1)(c-1).

Practice

Try it, then see where this concept fits in the path.

Section 10

Mini Practice

Try these on your own. Tap Reveal when you want to check.

  1. What clue tells you this is a Chi-Square Test situation: A die is rolled 600 times; face counts are 90, 110, 105, 95, 100, 100. Test goodness of fit to a fair die at α=0.05\alpha=0.05.

    Hint: Are the data counts of cases falling into categories, being compared to expected counts (rather than means or continuous values)?

  2. A die is rolled 600 times; face counts are 90, 110, 105, 95, 100, 100. Test goodness of fit to a fair die at α=0.05\alpha=0.05.

    Hint: Expected each face =6006=100=\frac{600}{6}=100; compute (OE)2E\sum\frac{(O-E)^2}{E}.

  3. Why is this a contrast case instead of Chi-Square Test: Instead you have the numeric test scores of two classes and want to know if their mean scores differ. Chi-square?

    Hint: The response is numeric and you're comparing means, not counts in categories.

  4. Fix this thinking: Dividing by Observed instead of Expected

    Hint: Name the recognition cue before choosing a rule.

  5. Which is the better fit here: Chi-Square Test or Two-sample t-test? Explain the deciding difference.

    Hint: For Chi-Square Test, ask: Are the data counts of cases falling into categories, being compared to expected counts (rather than means or continuous values)?

  6. Write one sentence that would remind a classmate how to recognize Chi-Square Test.

    Hint: Use the mental model "How far are the counts from what we expected." and one signal word.

Want the full set?

50 practice questions for this concept — free to try, every one with a complete worked solution showing the why, not just the answer.

Section 11

Frequently Asked Questions

How do I know when to use Chi-Square Test?

Use Chi-Square Test when your data are counts in categories and you test whether they fit an expected distribution or whether two categorical variables are independent. Do not start from the numbers alone; first name the structure of the situation. The fastest check is: Are the data counts of cases falling into categories, being compared to expected counts (rather than means or continuous values)? If the answer is yes and the wording matches cues like observed vs expected, goodness of fit, test of independence, then chi-square test is probably the right tool.

What is Chi-Square Test most often confused with?

Chi-Square Test is often confused with Two-sample t-test. Two-sample t-test means Compares MEANS of a numeric variable across two groups, not category counts. The difference is not just vocabulary; it changes the action you take. For chi-square test, the key test is "Are the data counts of cases falling into categories, being compared to expected counts (rather than means or continuous values)?" For two-sample t-test, the better cue is: Use when the response is numeric and you compare averages.

What is the fastest recognition cue for Chi-Square Test?

Look for observed vs expected, goodness of fit, test of independence, category counts, but treat those words as clues, not proof. A word problem can contain a familiar keyword and still ask for a different idea. After noticing the cue, ask the recognition question: Are the data counts of cases falling into categories, being compared to expected counts (rather than means or continuous values)? That question protects you from using a memorized procedure in the wrong place.

What mistake should I avoid with Chi-Square Test?

Avoid this thinking: "Dividing by Observed instead of Expected" That mistake usually happens when the student jumps to a rule before checking the situation. The safer version is: the formula is (OE)2E\frac{(O-E)^2}{E}, scaling each gap by the expected count. A good habit is to say the mental model out loud first: "How far are the counts from what we expected." Then choose the calculation or representation.

How can I tell this apart from Two-proportion z-test?

Two-proportion z-test is the better fit when the task is about this: Compares two proportions directly; chi-square of a 2×2 table is equivalent but generalizes to bigger tables. Chi-Square Test is the better fit when your data are counts in categories and you test whether they fit an expected distribution or whether two categorical variables are independent. If both ideas seem possible, compare what the problem wants as the final answer. The desired output often reveals whether you should use chi-square test or switch to the nearby concept.

Why does Chi-Square Test matter?

It's the standard test for categorical data — the place a mean-based t-test simply doesn't apply because there's nothing to average. Knowing that big squared deviations relative to expected counts drive a large χ2\chi^2 is what connects 'the counts look off' to a real, p-value-backed conclusion about fairness or association. The practical value is recognition: once you can spot chi-square test, you can choose a method before calculating. That makes later topics easier because you are not memorizing isolated tricks; you are recognizing the same structure when it appears in a new representation.

Section 12

Learning Path

Chi-Square Test

You are here

Next →

You're at the end!
Before this, students should be comfortable with Hypothesis Testing and P-Value. This page focuses on the recognition cue: Are the data counts of cases falling into categories, being compared to expected counts (rather than means or continuous values)? That cue is the bridge between earlier skills and later problem solving: students first learn to identify the structure, then they learn which calculation, diagram, graph, or proof move belongs to it. After this, students can use chi-square test as a tool in larger problems.

Section 13

See Also