Statistics & Probability
80 concepts in Math
Statistics and probability equip students to collect, analyze, and draw conclusions from data โ skills that are increasingly vital in a world driven by information. Statistics covers how to summarize data using measures of center (mean, median, mode) and spread (range, standard deviation), how to create and interpret graphs, and how to recognize patterns and outliers. Probability provides a mathematical framework for quantifying uncertainty and predicting outcomes. Students learn to calculate theoretical and experimental probabilities, use tree diagrams and sample spaces, and understand independent versus dependent events. Together, these ideas help students evaluate claims, understand risk, interpret polls and studies, and make informed decisions. This topic connects naturally to science, social studies, health, and personal finance.
Suggested learning path: Begin with data collection and graphical displays, then study measures of center and spread, move into basic probability, and finally explore sampling methods and introductory inference.
Mean
The arithmetic mean (average) of a data set is the sum of all values divided by the number of values.
Median
The median is the middle value of an ordered data set โ half of the values are above it and half are below it.
Mode
The mode is the value or values that appear most frequently in a data set โ it is the most common or most popular data value.
Range (Statistics)
The statistical range is the difference between the maximum and minimum values in a data set: $\text{range} = \max - \min$.
Standard Deviation
The standard deviation measures the average distance of data values from the mean, giving a typical spread around the center.
Variance
The variance is the average of the squared deviations from the mean: $\sigma^2 = \frac{1}{n}\sum (x_i - \bar{x})^2$. It is the square of the standard deviation.
Probability
Probability is a number between 0 and 1 (inclusive) that measures how likely an event is to occur, where 0 means impossible and 1 means certain.
Sample Space
The sample space $S$ is the set of all possible outcomes of a random experiment โ every outcome that could conceivably occur.
Independent Events
Two events are independent if the occurrence of one does not change the probability of the other: $P(A \cap B) = P(A) \cdot P(B)$.
Conditional Probability
The conditional probability $P(A|B)$ is the probability of event $A$ occurring given that event $B$ has already occurred.
Expected Value
The expected value of a random variable is the probability-weighted average of all possible outcomes โ the long-run mean over many repetitions.
Normal Distribution
The normal distribution (also called the Gaussian distribution or bell curve) is a continuous probability distribution that is symmetric about its mean, with data tapering off equally on both sides following a precise mathematical rule.
Z-Score
A z-score measures how many standard deviations a data value is above or below the mean: $z = (x - \mu)/\sigma$.
Permutation
A permutation is an ordered arrangement of objects โ the number of ways to choose and order $r$ items from $n$ distinct items is $P(n,r) = \frac{n!}{(n-r)!}$.
Combination
A combination is an unordered selection of objects โ the number of ways to choose $r$ items from $n$ distinct items is $C(n,r) = \frac{n!}{r!(n-r)!}$.
Factorial
The factorial of a non-negative integer $n$, written $n!$, is the product of all positive integers from 1 to $n$: $n! = n \cdot (n-1) \cdots 2 \cdot 1$.
Correlation
Correlation measures the strength and direction of the linear relationship between two quantitative variables, ranging from $-1$ to $+1$.
Scatter Plot
A scatter plot is a graph with one quantitative variable on each axis where each data point is plotted as a dot, revealing relationships between the two variables.
Histogram
A histogram is a bar chart of a frequency distribution where bars represent count or density of data within consecutive equal-width intervals (bins).
Box Plot
A box plot displays the five-number summary (minimum, Q1, median, Q3, maximum) of a data set using a box and whiskers.
Quartiles
Quartiles divide an ordered data set into four equal parts: Q1 is the 25th percentile, Q2 is the median (50th), and Q3 is the 75th percentile.
Interquartile Range
The interquartile range (IQR) is $Q3 - Q1$ โ the spread of the middle 50% of the data, resistant to outliers.
Data (Abstract)
Data is a collection of recorded observations or measurements used to describe, analyze, or make inferences about a phenomenon or population.
Measurement
Measurement is the process of assigning numerical values to attributes of objects or events according to a defined rule or scale.
Variability
Variability is the degree to which data points in a set differ from each other and from the center of the distribution.
Noise
Noise is random variation in data that is not explained by the underlying pattern or model โ the unpredictable fluctuations around the true signal.
Signal vs Noise
Distinguishing meaningful patterns (signal) from random variation (noise) in data.
Distribution (Intuition)
A distribution describes how data values are spread out across their range โ which values occur, how often, and whether the data is symmetric or skewed.
Center vs Spread
Center versus spread describes two complementary aspects of any data distribution: center (mean, median) tells you where the typical value lies, while spread (range, IQR, standard deviation) tells you how much the values vary around that center.
Outliers (Deep)
An outlier is a data value that lies unusually far from most other values, potentially indicating measurement error, a rare event, or an important exception.
Randomness
The quality of having no predictable pattern; outcomes are uncertain but follow probability rules.
Chance
Chance describes the inherent randomness in outcomes of experiments โ the fact that even with complete knowledge, some events cannot be predicted with certainty.
Probability as Expectation
Probability can be interpreted as the long-run relative frequency of an event over infinitely many identical trials of a random experiment.
Events (Formal)
A formal event is a subset of the sample space โ a collection of outcomes to which a probability is assigned; events can be simple (one outcome) or compound (many outcomes).
Dependence (Statistical)
When the probability of one event changes based on whether another event occurred.
Causation
Causation exists when one variable directly produces or influences a change in another variable โ distinct from mere correlation or association.
Sampling Bias
Sampling bias occurs when the method of selecting a sample systematically over- or under-represents certain groups relative to their actual proportion in the population.
Representativeness
A sample is representative if its characteristics (distribution of key variables) closely match those of the population it is meant to represent.
Law of Large Numbers (Intuition)
As sample size increases, the sample average approaches the true population average.
Risk
The possibility of loss or negative outcome, often quantified by probability and severity.
Uncertainty
Uncertainty is the state of having incomplete or imperfect information about a quantity, outcome, or process, making precise prediction impossible.
Prediction
A prediction is a model-based estimate of an unknown or future value, accompanied by a measure of confidence or uncertainty.
Model Fit (Intuition)
Model fit describes how closely a statistical model's predictions match the observed data โ measured by residuals, $R^2$, or loss functions.
Overfitting (Intuition)
Overfitting occurs when a model learns the noise in training data instead of just the underlying pattern, performing well on training data but poorly on new data.
Underfitting (Intuition)
Underfitting occurs when a model is too simple to capture the true pattern in the data, performing poorly on both training data and new data.
Data Visualization
Data visualization is the use of graphs, charts, and other visual representations to communicate patterns, trends, and relationships in data.
Misleading Graphs
A misleading graph is a data visualization that distorts the true pattern through truncated axes, unequal intervals, cherry-picked data, or manipulated scales.
Scale Distortion
Scale distortion occurs when a graph's axis does not start at zero or uses inconsistent intervals, making small differences appear large or large differences appear small.
Aggregation
Aggregation is the process of combining many individual data values into a single summary statistic such as a sum, mean, count, or proportion.
Normalization (Statistics)
Normalization rescales data to a standard range or distribution โ such as $[0,1]$ or zero mean and unit variance โ to make different variables comparable.
Proportional Data
Proportional data expresses quantities as fractions or percentages of a whole, enabling fair comparison across groups of different sizes.
Comparative Statistics
Comparative statistics involves using statistical measures to compare two or more groups, data sets, or distributions.
Probabilistic Thinking
Probabilistic thinking is the habit of reasoning about uncertain outcomes in terms of likelihood, expected value, and distributions rather than certainties.
Decision Under Uncertainty
Decision under uncertainty involves choosing between options whose outcomes are not known for certain, typically by comparing expected values or risk profiles.
Binomial Coefficient
The number of ways to choose $k$ items from $n$ items, written $C(n, k)$ or $\binom{n}{k}$.
Binomial Distribution
The probability distribution of the number of successes in $n$ independent yes/no trials, each with probability $p$.
Sampling Distribution
The probability distribution of a statistic (such as the sample mean) computed from all possible random samples of the same size drawn from a population.
Central Limit Theorem
For sufficiently large sample size ($n \geq 30$ as a rule of thumb), the sampling distribution of the sample mean is approximately normal with mean $\mu$ and standard deviation $\frac{\sigma}{\sqrt{n}}$, regardless of the shape of the population distribution.
Confidence Interval
A range of values, computed from sample data, that is likely to contain the true population parameter with a specified level of confidence.
Margin of Error
The maximum expected difference between the sample statistic and the true population parameter; it is half the width of a confidence interval.
Hypothesis Testing
A systematic method to decide whether sample data provides enough evidence to reject a claim (null hypothesis) about a population parameter.
P-Value
The probability of observing a test statistic at least as extreme as the one computed from the sample data, assuming the null hypothesis $H_0$ is true.
Type I and Type II Errors
Type I error ($\alpha$): rejecting $H_0$ when it is actually true (false positive). Type II error ($\beta$): failing to reject $H_0$ when it is actually false (false negative).
Experimental Design
The deliberate planning of a study in which the researcher imposes treatments on subjects and measures responses, using control groups, randomization, replication, and (where possible) blinding to establish cause-and-effect relationships.
Observational vs Experimental Studies
An observational study records data without imposing treatments, while an experiment deliberately manipulates a variable. Only experiments with random assignment can establish causation; observational studies can only show association.
Sampling Methods
Systematic approaches for selecting a subset of individuals from a population. The main probability methods are: simple random sample (SRS), stratified random sample, cluster sample, and systematic sample. Convenience sampling is a non-probability method that is generally biased.
Geometric Distribution
The probability distribution for the number of independent Bernoulli trials needed to get the first success, where each trial has success probability $p$.
Chi-Square Test
A family of hypothesis tests that use the chi-square statistic to compare observed frequencies to expected frequencies. The three main types are: goodness-of-fit (does data match a claimed distribution?), test of independence (are two categorical variables related?), and test of homogeneity (do different populations have the same distribution?).
Least Squares Regression Line
The unique straight line $\hat{y} = a + bx$ that minimizes the sum of squared vertical distances (residuals) between the observed data points and the line.
Residuals
The difference between an observed value and its predicted value from a regression model: $\text{residual} = y - \hat{y}$ (observed minus predicted).
Coefficient of Determination
The proportion of the total variation in the response variable $y$ that is explained by the linear relationship with the explanatory variable $x$. It equals the square of the correlation coefficient: $r^2$.
Inference for Regression
Using hypothesis tests and confidence intervals to draw conclusions about the true population slope $\beta_1$ of the linear relationship $y = \beta_0 + \beta_1 x + \varepsilon$, based on sample data.
Power of a Test
The probability that a hypothesis test correctly rejects a false null hypothesis. Power $= P(\text{reject } H_0 \mid H_0 \text{ is false}) = 1 - \beta$, where $\beta$ is the probability of a Type II error.
Paired t-Test
A hypothesis test for the mean difference in a paired (matched) data design, where each subject provides two related measurements. The test analyzes the differences $d_i = x_{1i} - x_{2i}$ as a single sample.
Two-Sample Tests
Hypothesis tests and confidence intervals for comparing parameters (means or proportions) of two independent populations. The two-sample t-test compares means; the two-proportion z-test compares proportions.
Compound Probability
The probability of two or more events occurring together ($P(A \text{ and } B)$) or at least one occurring ($P(A \text{ or } B)$), accounting for whether the events are independent or dependent.
Experimental vs. Theoretical Probability
Theoretical probability is calculated from known outcomes ($P = \frac{\text{favorable}}{\text{total}}$), while experimental probability is estimated from actual trials ($P \approx \frac{\text{times event occurred}}{\text{total trials}}$). As the number of trials increases, experimental probability tends to approach theoretical probability.
Mean Absolute Deviation
The average distance between each data value and the mean of the data set. Calculated by finding the mean, computing the absolute value of each deviation from the mean, and averaging those absolute deviations.
Two-Way Tables
A table that displays frequencies for two categorical variables simultaneously, organized with one variable in rows and the other in columns. It shows joint frequencies (individual cells), marginal frequencies (row/column totals), and enables calculation of conditional frequencies.
Bayes' Theorem
Bayes' theorem gives the posterior probability of a hypothesis given evidence: $P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)}$.