Chi-Square Test

Statistics
process

Also known as: ฯ‡ยฒ test, chi-squared test

Grade 9-12

View on concept map

A family of hypothesis tests that use the chi-square statistic to compare observed frequencies to expected frequencies. Essential for analyzing categorical data: survey responses, genetics ratios, market research categories, and any situation where you compare proportions across groups.

Definition

A family of hypothesis tests that use the chi-square statistic to compare observed frequencies to expected frequencies. The three main types are: goodness-of-fit (does data match a claimed distribution?), test of independence (are two categorical variables related?), and test of homogeneity (do different populations have the same distribution?).

๐Ÿ’ก Intuition

You expect a die to land on each face about \frac{1}{6} of the time. You roll it 600 times and compare what you observed to what you expected. If the differences are small, the die is probably fair. If they're large, something is off. The chi-square statistic measures 'how far off are the observed counts from what we expected?'

๐ŸŽฏ Core Idea

Chi-square tests work with categorical (count) data, not numerical measurements. Large \chi^2 values mean the observed data deviates significantly from what was expected under H_0.

Example

Roll a die 60 times. Expected: 10 per face. Observed: 8, 12, 11, 7, 13, 9. \chi^2 = \frac{(8-10)^2}{10} + \frac{(12-10)^2}{10} + \cdots + \frac{(9-10)^2}{10} = 2.8 Compare to \chi^2 critical value with df = 5. Not significantโ€”die appears fair.

Formula

\chi^2 = \sum \frac{(\text{Observed} - \text{Expected})^2}{\text{Expected}}

Notation

\chi^2 is the test statistic. Degrees of freedom: goodness-of-fit df = k - 1; independence/homogeneity df = (r-1)(c-1).

๐ŸŒŸ Why It Matters

Essential for analyzing categorical data: survey responses, genetics ratios, market research categories, and any situation where you compare proportions across groups.

Formal View

\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i} where O_i is observed count and E_i is expected count; df = k - 1 (GOF) or (r-1)(c-1) (independence)

๐Ÿšง Common Stuck Point

Students struggle to distinguish the three types: goodness-of-fit tests one variable against a claimed distribution, independence tests the relationship between two variables in one sample, homogeneity tests the same variable across multiple populations.

โš ๏ธ Common Mistakes

  • Using chi-square on numerical (continuous) data instead of categorical (count) data.
  • Forgetting to check that all expected counts are at least 5โ€”small expected counts make the chi-square approximation unreliable.
  • Confusing independence and homogeneity testsโ€”they use the same formula but ask different questions and arise from different study designs.

Frequently Asked Questions

What is Chi-Square Test in Math?

A family of hypothesis tests that use the chi-square statistic to compare observed frequencies to expected frequencies. The three main types are: goodness-of-fit (does data match a claimed distribution?), test of independence (are two categorical variables related?), and test of homogeneity (do different populations have the same distribution?).

Why is Chi-Square Test important?

Essential for analyzing categorical data: survey responses, genetics ratios, market research categories, and any situation where you compare proportions across groups.

What do students usually get wrong about Chi-Square Test?

Students struggle to distinguish the three types: goodness-of-fit tests one variable against a claimed distribution, independence tests the relationship between two variables in one sample, homogeneity tests the same variable across multiple populations.

What should I learn before Chi-Square Test?

Before studying Chi-Square Test, you should understand: hypothesis testing, p value, probability.

How Chi-Square Test Connects to Other Ideas

To understand chi-square test, you should first be comfortable with hypothesis testing, p value and probability.