Statistics Concepts

65 concepts · Grades 3-5, 6-8, 9-12 · 98 prerequisite connections

This family view narrows the full math map to one connected cluster. Read it from left to right: earlier nodes support later ones, and dense middle sections usually mark the concepts that hold the largest share of future work together.

Use the graph to plan review, then use the full concept list below to open precise pages for definitions, examples, formulas, and related mistake guides. That combination keeps the page useful for both human study flow and crawlable internal linking.

Concept Dependency Graph

Concepts flow left to right, from foundational to advanced. Hover to highlight connections. Click any concept to learn more.

Connected Families

Statistics concepts have 33 connections to other families.

Arithmetic Functions Probability Algebra Measurement

All Statistics Concepts

Mean

The arithmetic mean (average) of a data set is the sum of all values divided by the number of values.

6-8

Median

The median is the middle value of an ordered data set — half of the values are above it and half are below it.

6-8

Mode

The mode is the value or values that appear most frequently in a data set — it is the most common or most popular data value.

6-8

Range (Statistics)

The statistical range is the difference between the maximum and minimum values in a data set: $\text{range} = \max - \min$.

6-8

Standard Deviation

The standard deviation measures the average distance of data values from the mean, giving a typical spread around the center.

9-12

Variance

The variance is the average of the squared deviations from the mean: $\sigma^2 = \frac{1}{n}\sum (x_i - \bar{x})^2$. It is the square of the standard deviation.

9-12

Expected Value

The expected value of a random variable is the probability-weighted average of all possible outcomes — the long-run mean over many repetitions.

9-12

Normal Distribution

The normal distribution (also called the Gaussian distribution or bell curve) is a continuous probability distribution that is symmetric about its mean, with data tapering off equally on both sides following a precise mathematical rule.

9-12

Z-Score

A z-score measures how many standard deviations a data value is above or below the mean: $z = (x - \mu)/\sigma$.

9-12

Permutation

A permutation is an ordered arrangement of objects — the number of ways to choose and order $r$ items from $n$ distinct items is $P(n,r) = \frac{n!}{(n-r)!}$.

9-12

Combination

A combination is an unordered selection of objects — the number of ways to choose $r$ items from $n$ distinct items is $C(n,r) = \frac{n!}{r!(n-r)!}$.

9-12

Factorial

The factorial of a non-negative integer $n$, written $n!$, is the product of all positive integers from 1 to $n$: $n! = n \cdot (n-1) \cdots 2 \cdot 1$.

9-12

Correlation

Correlation measures the strength and direction of the linear relationship between two quantitative variables, ranging from $-1$ to $+1$.

6-8

Scatter Plot

A scatter plot is a graph with one quantitative variable on each axis where each data point is plotted as a dot, revealing relationships between the two variables.

6-8

Histogram

A histogram is a bar chart of a frequency distribution where bars represent count or density of data within consecutive equal-width intervals (bins).

6-8

Box Plot

A box plot displays the five-number summary (minimum, Q1, median, Q3, maximum) of a data set using a box and whiskers.

6-8

Quartiles

Quartiles divide an ordered data set into four equal parts: Q1 is the 25th percentile, Q2 is the median (50th), and Q3 is the 75th percentile.

6-8

Interquartile Range

The interquartile range (IQR) is $Q3 - Q1$ — the spread of the middle 50% of the data, resistant to outliers.

6-8

Data (Abstract)

Data is a collection of recorded observations or measurements used to describe, analyze, or make inferences about a phenomenon or population.

3-5

Measurement

Measurement is the process of assigning numerical values to attributes of objects or events according to a defined rule or scale.

3-5

Variability

Variability is the degree to which data points in a set differ from each other and from the center of the distribution.

6-8

Noise

Noise is random variation in data that is not explained by the underlying pattern or model — the unpredictable fluctuations around the true signal.

6-8

Signal vs Noise

Signal versus noise describes the fundamental challenge of separating meaningful patterns (signal) from random, unpredictable variation (noise) in data — the central task of all statistical analysis.

6-8

Distribution (Intuition)

A distribution describes how data values are spread out across their range — which values occur, how often, and whether the data is symmetric or skewed.

6-8

Center vs Spread

Center and spread are two complementary ways to describe a data distribution. Center (mean, median, mode) tells you where values cluster; spread (range, interquartile range, standard deviation) tells you how far values are from that center. Together they give a complete picture of any dataset.

6-8

Outliers (Deep)

An outlier is a data value that lies unusually far from most other values, potentially indicating measurement error, a rare event, or an important exception.

6-8

Dependence (Statistical)

Two events are statistically dependent when knowing one event occurred changes the probability of the other — formally, $P(B|A) \neq P(B)$, meaning the events share information.

6-8

Causation

Causation exists when one variable directly produces or influences a change in another variable — distinct from mere correlation or association.

9-12

Sampling Bias

Sampling bias occurs when the method of selecting a sample systematically over- or under-represents certain groups relative to their actual proportion in the population.

6-8

Representativeness

A sample is representative if its characteristics (distribution of key variables) closely match those of the population it is meant to represent.

6-8

Law of Large Numbers (Intuition)

The law of large numbers states that as the number of independent trials increases, the sample mean converges to the true population mean — randomness averages out over many repetitions.

9-12

Prediction

A prediction is a model-based estimate of an unknown or future value, accompanied by a measure of confidence or uncertainty.

6-8

Model Fit (Intuition)

Model fit describes how closely a statistical model's predictions match the observed data — measured by residuals, $R^2$, or loss functions.

9-12

Overfitting (Intuition)

Overfitting occurs when a model learns the noise in training data instead of just the underlying pattern, performing well on training data but poorly on new data.

9-12

Underfitting (Intuition)

Underfitting occurs when a model is too simple to capture the true pattern in the data, performing poorly on both training data and new data.

9-12

Data Visualization

Data visualization is the use of graphs, charts, and other visual representations to communicate patterns, trends, and relationships in data.

3-5

Misleading Graphs

A misleading graph is a data visualization that distorts the true pattern through truncated axes, unequal intervals, cherry-picked data, or manipulated scales.

6-8

Scale Distortion

Scale distortion occurs when a graph's axis does not start at zero or uses inconsistent intervals, making small differences appear large or large differences appear small.

6-8

Aggregation

Aggregation is the process of combining many individual data values into a single summary statistic such as a sum, mean, count, or proportion.

6-8

Normalization (Statistics)

Normalization rescales data to a standard range or distribution — such as $[0,1]$ or zero mean and unit variance — to make different variables comparable.

6-8

Proportional Data

Proportional data expresses quantities as fractions or percentages of a whole, enabling fair comparison across groups of different sizes.

6-8

Comparative Statistics

Comparative statistics involves using statistical measures to compare two or more groups, data sets, or distributions.

6-8

Probabilistic Thinking

Probabilistic thinking is the habit of reasoning about uncertain outcomes in terms of likelihood, expected value, and distributions rather than certainties.

6-8

Sampling Distribution

The probability distribution of a statistic (such as the sample mean) computed from all possible random samples of the same size drawn from a population.

9-12

Central Limit Theorem

For sufficiently large sample size ($n \geq 30$ as a rule of thumb), the sampling distribution of the sample mean is approximately normal with mean $\mu$ and standard deviation $\frac{\sigma}{\sqrt{n}}$, regardless of the shape of the population distribution.

9-12

Confidence Interval

A range of values, computed from sample data, that is likely to contain the true population parameter with a specified level of confidence.

9-12

Margin of Error

The maximum expected difference between the sample statistic and the true population parameter; it is half the width of a confidence interval.

9-12

Hypothesis Testing

A systematic method to decide whether sample data provides enough evidence to reject a claim (null hypothesis) about a population parameter.

9-12

P-Value

The probability of observing a test statistic at least as extreme as the one computed from the sample data, assuming the null hypothesis $H_0$ is true.

9-12

Type I and Type II Errors

Type I error ($\alpha$): rejecting $H_0$ when it is actually true (false positive). Type II error ($\beta$): failing to reject $H_0$ when it is actually false (false negative).

9-12

Experimental Design

The deliberate planning of a study in which the researcher imposes treatments on subjects and measures responses, using control groups, randomization, replication, and (where possible) blinding to establish cause-and-effect relationships.

6-8

Observational vs Experimental Studies

An observational study records data without imposing treatments, while an experiment deliberately manipulates a variable. Only experiments with random assignment can establish causation; observational studies can only show association.

6-8

Sampling Methods

Systematic approaches for selecting a subset of individuals from a population. The main probability methods are: simple random sample (SRS), stratified random sample, cluster sample, and systematic sample. Convenience sampling is a non-probability method that is generally biased.

6-8

Chi-Square Test

A hypothesis test that compares observed frequencies to expected frequencies using the chi-square statistic to assess independence or goodness of fit.

9-12

Least Squares Regression Line

The unique straight line $\hat{y} = a + bx$ that minimizes the sum of squared vertical distances (residuals) between the observed data points and the line.

9-12

Residuals

The difference between an observed value and its predicted value from a regression model: $\text{residual} = y - \hat{y}$ (observed minus predicted).

9-12

Coefficient of Determination

The proportion of the total variation in the response variable $y$ that is explained by the linear relationship with the explanatory variable $x$. It equals the square of the correlation coefficient: $r^2$.

9-12

Inference for Regression

Using hypothesis tests and confidence intervals to draw conclusions about the true population slope $\beta_1$ of the linear relationship $y = \beta_0 + \beta_1 x + \varepsilon$, based on sample data.

9-12

Power of a Test

The probability that a hypothesis test correctly rejects a false null hypothesis. Power $= P(\text{reject } H_0 \mid H_0 \text{ is false}) = 1 - \beta$, where $\beta$ is the probability of a Type II error.

9-12

Paired t-Test

A hypothesis test for the mean difference in a paired (matched) data design, where each subject provides two related measurements. The test analyzes the differences $d_i = x_{1i} - x_{2i}$ as a single sample.

9-12

Two-Sample Tests

Hypothesis tests and confidence intervals for comparing parameters (means or proportions) of two independent populations. The two-sample t-test compares means; the two-proportion z-test compares proportions.

9-12

Mean Absolute Deviation

The average distance between each data value and the mean of the data set. Calculated by finding the mean, computing the absolute value of each deviation from the mean, and averaging those absolute deviations.

6-8

Two-Way Tables

A table that displays frequencies for two categorical variables simultaneously, organized with one variable in rows and the other in columns. It shows joint frequencies (individual cells), marginal frequencies (row/column totals), and enables calculation of conditional frequencies.

6-8

Line Plots

A line plot (dot plot) displays data by placing marks (dots or Xs) above a number line to show the frequency of each value.

3-5

Population vs Sample

A population is the entire group you want to study. A sample is a smaller subset of that population that you actually collect data from.

6-8

Back to Concept Map