Math · Topic

Statistics & Probability

82 concepts · ordered by prerequisite depth

Statistics and probability equip students to collect, analyze, and draw conclusions from data — skills that are increasingly vital in a world driven by information. Statistics covers how to summarize data using measures of center (mean, median, mode) and spread (range, standard deviation), how to create and interpret graphs, and how to recognize patterns and outliers. Probability provides a mathematical framework for quantifying uncertainty and predicting outcomes. Students learn to calculate theoretical and experimental probabilities, use tree diagrams and sample spaces, and understand independent versus dependent events. Together, these ideas help students evaluate claims, understand risk, interpret polls and studies, and make informed decisions. This topic connects naturally to science, social studies, health, and personal finance.

Suggested order: Begin with data collection and graphical displays, then study measures of center and spread, move into basic probability, and finally explore sampling methods and introductory inference.

Start here

Mean

The arithmetic mean (average) of a data set is the sum of all values divided by the number of values.

Open lesson

Probability

Probability is a number between 0 and 1 (inclusive) that measures how likely an event is to occur, where 0 means impossible and 1 means certain.

Open lesson

Sample Space

The sample space $S$ is the set of all possible outcomes of a random experiment — every outcome that could conceivably occur.

Open lesson

Continue from here · 79 concepts

Chance

Chance describes the inherent randomness in outcomes of experiments — the fact that even with complete knowledge, some events cannot be predicted with certainty.

Data (Abstract)

Data is a collection of recorded observations or measurements used to describe, analyze, or make inferences about a phenomenon or population.

Factorial

The factorial of a non-negative integer $n$, written $n!$, is the product of all positive integers from 1 to $n$: $n! = n \cdot (n-1) \cdots 2 \cdot 1$.

Histogram

A histogram is a bar chart of a frequency distribution where bars represent count or density of data within consecutive equal-width intervals (bins).

Line Plots

A line plot (dot plot) displays data by placing marks (dots or Xs) above a number line to show the frequency of each value.

Measurement

Measurement is the process of assigning numerical values to attributes of objects or events according to a defined rule or scale.

Median

The median is the middle value of an ordered data set — half of the values are above it and half are below it.

Mode

The mode is the value or values that appear most frequently in a data set — it is the most common or most popular data value.

Normalization (Statistics)

Normalization rescales data to a standard range or distribution — such as $[0,1]$ or zero mean and unit variance — to make different variables comparable.

Proportional Data

Proportional data expresses quantities as fractions or percentages of a whole, enabling fair comparison across groups of different sizes.

Range (Statistics)

The statistical range is the difference between the maximum and minimum values in a data set: $\text{range} = \max - \min$.

Scatter Plot

A scatter plot is a graph with one quantitative variable on each axis where each data point is plotted as a dot, revealing relationships between the two variables.

Uncertainty

Uncertainty is the state of having incomplete or imperfect information about a quantity, outcome, or process, making precise prediction impossible.

Aggregation

Aggregation is the process of combining many individual data values into a single summary statistic such as a sum, mean, count, or proportion.

Data Visualization

Data visualization is the use of graphs, charts, and other visual representations to communicate patterns, trends, and relationships in data.

Expected Value

The expected value of a random variable is the probability-weighted average of all possible outcomes — the long-run mean over many repetitions.

Independent Events

Two events are independent if the occurrence of one does not change the probability of the other: $P(A \cap B) = P(A) \cdot P(B)$.

Law of Large Numbers (Intuition)

The law of large numbers states that as the number of independent trials increases, the sample mean converges to the true population mean — randomness averages out over many repetitions.

Mean Absolute Deviation

The average distance between each data value and the mean of the data set. Calculated by finding the mean, computing the absolute value of each deviation from the mean, and averaging those absolute deviations.

Permutation

A permutation is an ordered arrangement of objects — the number of ways to choose and order $r$ items from $n$ distinct items is $P(n,r) = \frac{n!}{(n-r)!}$.

Probabilistic Thinking

Probabilistic thinking is the habit of reasoning about uncertain outcomes in terms of likelihood, expected value, and distributions rather than certainties.

Probability as Expectation

Probability can be interpreted as the long-run relative frequency of an event over infinitely many identical trials of a random experiment.

Quartiles

Quartiles divide an ordered data set into four equal parts: Q1 is the 25th percentile, Q2 is the median (50th), and Q3 is the 75th percentile.

Randomness

Randomness is the quality of having no predictable pattern at the individual level, yet following precise probability rules over many repetitions — outcomes are uncertain one at a time but statistically regular in the long run.

Risk

Risk is the possibility of loss or negative outcome, quantified by combining the probability of the event with the severity of its impact: Expected Loss = P(loss) times amount of loss.

Sampling Bias

Sampling bias occurs when the method of selecting a sample systematically over- or under-represents certain groups relative to their actual proportion in the population.

Standard Deviation

The standard deviation measures the average distance of data values from the mean, giving a typical spread around the center.

Two-Way Tables

A table that displays frequencies for two categorical variables simultaneously, organized with one variable in rows and the other in columns. It shows joint frequencies (individual cells), marginal frequencies (row/column totals), and enables calculation of conditional frequencies.

Variability

Variability is the degree to which data points in a set differ from each other and from the center of the distribution.

Box Plot

A box plot displays the five-number summary (minimum, Q1, median, Q3, maximum) of a data set using a box and whiskers.

Center vs Spread

Center and spread are two complementary ways to describe a data distribution. Center (mean, median, mode) tells you where values cluster; spread (range, interquartile range, standard deviation) tells you how far values are from that center. Together they give a complete picture of any dataset.

Combination

A combination is an unordered selection of objects — the number of ways to choose $r$ items from $n$ distinct items is $C(n,r) = \frac{n!}{r!(n-r)!}$.

Comparative Statistics

Comparative statistics involves using statistical measures to compare two or more groups, data sets, or distributions.

Conditional Probability

The conditional probability $P(A|B)$ is the probability of event $A$ occurring given that event $B$ has already occurred.

Correlation

Correlation measures the strength and direction of the linear relationship between two quantitative variables, ranging from $-1$ to $+1$.

Decision Under Uncertainty

Decision under uncertainty involves choosing between options whose outcomes are not known for certain, typically by comparing expected values or risk profiles.

Dependence (Statistical)

Two events are statistically dependent when knowing one event occurred changes the probability of the other — formally, $P(B|A) \neq P(B)$, meaning the events share information.

Distribution (Intuition)

A distribution describes how data values are spread out across their range — which values occur, how often, and whether the data is symmetric or skewed.

Events (Formal)

A formal event is a subset of the sample space — a collection of outcomes to which a probability is assigned; events can be simple (one outcome) or compound (many outcomes).

Experimental vs. Theoretical Probability

Theoretical probability is calculated from known outcomes ($P = \frac{\text{favorable}}{\text{total}}$), while experimental probability is estimated from actual trials ($P \approx \frac{\text{times event occurred}}{\text{total trials}}$). As the number of trials increases, experimental probability tends to approach theoretical probability.

Interquartile Range

The interquartile range (IQR) is $Q3 - Q1$ — the spread of the middle 50% of the data, resistant to outliers.

Misleading Graphs

A misleading graph is a data visualization that distorts the true pattern through truncated axes, unequal intervals, cherry-picked data, or manipulated scales.

Noise

Noise is random variation in data that is not explained by the underlying pattern or model — the unpredictable fluctuations around the true signal.

Normal Distribution

The normal distribution (also called the Gaussian distribution or bell curve) is a continuous probability distribution that is symmetric about its mean, with data tapering off equally on both sides following a precise mathematical rule.

Population vs Sample

A population is the entire group you want to study. A sample is a smaller subset of that population that you actually collect data from.

Representativeness

A sample is representative if its characteristics (distribution of key variables) closely match those of the population it is meant to represent.

Variance

The variance is the average of the squared deviations from the mean: $\sigma^2 = \frac{1}{n}\sum (x_i - \bar{x})^2$. It is the square of the standard deviation.

Z-Score

A z-score measures how many standard deviations a data value is above or below the mean: $z = (x - \mu)/\sigma$.

Bayes' Theorem

Bayes' theorem gives the posterior probability of a hypothesis given evidence: $P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)}$.

Binomial Coefficient

The binomial coefficient $\binom{n}{k}$ counts the number of ways to choose $k$ items from $n$ distinct items without regard to order. It equals $\frac{n!}{k!(n-k)!}$.

Causation

Causation exists when one variable directly produces or influences a change in another variable — distinct from mere correlation or association.

Compound Probability

The probability of two or more events occurring together ($P(A \text{ and } B)$) or at least one occurring ($P(A \text{ or } B)$), accounting for whether the events are independent or dependent.

Least Squares Regression Line

The unique straight line $\hat{y} = a + bx$ that minimizes the sum of squared vertical distances (residuals) between the observed data points and the line.

Outliers (Deep)

An outlier is a data value that lies unusually far from most other values, potentially indicating measurement error, a rare event, or an important exception.

Prediction

A prediction is a model-based estimate of an unknown or future value, accompanied by a measure of confidence or uncertainty.

Sampling Distribution

The probability distribution of a statistic (such as the sample mean) computed from all possible random samples of the same size drawn from a population.

Sampling Methods

Systematic approaches for selecting a subset of individuals from a population. The main probability methods are: simple random sample (SRS), stratified random sample, cluster sample, and systematic sample. Convenience sampling is a non-probability method that is generally biased.

Scale Distortion

Scale distortion occurs when a graph's axis does not start at zero or uses inconsistent intervals, making small differences appear large or large differences appear small.

Signal vs Noise

Signal versus noise describes the fundamental challenge of separating meaningful patterns (signal) from random, unpredictable variation (noise) in data — the central task of all statistical analysis.

Binomial Distribution

The probability distribution of the number of successes in $n$ independent yes/no trials, each with probability $p$.

Central Limit Theorem

For sufficiently large sample size ($n \geq 30$ as a rule of thumb), the sampling distribution of the sample mean is approximately normal with mean $\mu$ and standard deviation $\frac{\sigma}{\sqrt{n}}$, regardless of the shape of the population distribution.

Experimental Design

The deliberate planning of a study in which the researcher imposes treatments on subjects and measures responses, using control groups, randomization, replication, and (where possible) blinding to establish cause-and-effect relationships.

Hypothesis Testing

A systematic method to decide whether sample data provides enough evidence to reject a claim (null hypothesis) about a population parameter.

Model Fit (Intuition)

Model fit describes how closely a statistical model's predictions match the observed data — measured by residuals, $R^2$, or loss functions.

Residuals

The difference between an observed value and its predicted value from a regression model: $\text{residual} = y - \hat{y}$ (observed minus predicted).

Coefficient of Determination

The proportion of the total variation in the response variable $y$ that is explained by the linear relationship with the explanatory variable $x$. It equals the square of the correlation coefficient: $r^2$.

Confidence Interval

A range of values, computed from sample data, that is likely to contain the true population parameter with a specified level of confidence.

Geometric Distribution

The probability distribution for the number of independent Bernoulli trials needed to get the first success, where each trial has success probability $p$.

Observational vs Experimental Studies

An observational study records data without imposing treatments, while an experiment deliberately manipulates a variable. Only experiments with random assignment can establish causation; observational studies can only show association.

Overfitting (Intuition)

Overfitting occurs when a model learns the noise in training data instead of just the underlying pattern, performing well on training data but poorly on new data.

P-Value

The probability of observing a test statistic at least as extreme as the one computed from the sample data, assuming the null hypothesis $H_0$ is true.

Underfitting (Intuition)

Underfitting occurs when a model is too simple to capture the true pattern in the data, performing poorly on both training data and new data.

Chi-Square Test

A hypothesis test that compares observed frequencies to expected frequencies using the chi-square statistic to assess independence or goodness of fit.

Inference for Regression

Using hypothesis tests and confidence intervals to draw conclusions about the true population slope $\beta_1$ of the linear relationship $y = \beta_0 + \beta_1 x + \varepsilon$, based on sample data.

Margin of Error

The maximum expected difference between the sample statistic and the true population parameter; it is half the width of a confidence interval.

Paired t-Test

A hypothesis test for the mean difference in a paired (matched) data design, where each subject provides two related measurements. The test analyzes the differences $d_i = x_{1i} - x_{2i}$ as a single sample.

Two-Sample Tests

Hypothesis tests and confidence intervals for comparing parameters (means or proportions) of two independent populations. The two-sample t-test compares means; the two-proportion z-test compares proportions.

Type I and Type II Errors

Type I error ($\alpha$): rejecting $H_0$ when it is actually true (false positive). Type II error ($\beta$): failing to reject $H_0$ when it is actually false (false negative).

Power of a Test

The probability that a hypothesis test correctly rejects a false null hypothesis. Power $= P(\text{reject } H_0 \mid H_0 \text{ is false}) = 1 - \beta$, where $\beta$ is the probability of a Type II error.