Statistics Concepts

63 concepts ยท Grades 3-5, 6-8, 9-12 ยท 96 prerequisite connections

This family view narrows the full math map to one connected cluster. Read it from left to right: earlier nodes support later ones, and dense middle sections usually mark the concepts that hold the largest share of future work together.

Use the graph to plan review, then use the full concept list below to open precise pages for definitions, examples, formulas, and related mistake guides. That combination keeps the page useful for both human study flow and crawlable internal linking.

Concept Dependency Graph

Concepts flow left to right, from foundational to advanced. Hover to highlight connections. Click any concept to learn more.

Connected Families

Statistics concepts have 31 connections to other families.

All Statistics Concepts

Mean

The arithmetic mean (average) of a data set is the sum of all values divided by the number of values.

6-8

Median

The median is the middle value of an ordered data set โ€” half of the values are above it and half are below it.

6-8

Mode

The mode is the value or values that appear most frequently in a data set โ€” it is the most common or most popular data value.

6-8

Range (Statistics)

The statistical range is the difference between the maximum and minimum values in a data set: $\text{range} = \max - \min$.

6-8

Standard Deviation

The standard deviation measures the average distance of data values from the mean, giving a typical spread around the center.

9-12

Variance

The variance is the average of the squared deviations from the mean: $\sigma^2 = \frac{1}{n}\sum (x_i - \bar{x})^2$. It is the square of the standard deviation.

9-12

Expected Value

The expected value of a random variable is the probability-weighted average of all possible outcomes โ€” the long-run mean over many repetitions.

9-12

Normal Distribution

The normal distribution (also called the Gaussian distribution or bell curve) is a continuous probability distribution that is symmetric about its mean, with data tapering off equally on both sides following a precise mathematical rule.

9-12

Z-Score

A z-score measures how many standard deviations a data value is above or below the mean: $z = (x - \mu)/\sigma$.

9-12

Permutation

A permutation is an ordered arrangement of objects โ€” the number of ways to choose and order $r$ items from $n$ distinct items is $P(n,r) = \frac{n!}{(n-r)!}$.

9-12

Combination

A combination is an unordered selection of objects โ€” the number of ways to choose $r$ items from $n$ distinct items is $C(n,r) = \frac{n!}{r!(n-r)!}$.

9-12

Factorial

The factorial of a non-negative integer $n$, written $n!$, is the product of all positive integers from 1 to $n$: $n! = n \cdot (n-1) \cdots 2 \cdot 1$.

9-12

Correlation

Correlation measures the strength and direction of the linear relationship between two quantitative variables, ranging from $-1$ to $+1$.

6-8

Scatter Plot

A scatter plot is a graph with one quantitative variable on each axis where each data point is plotted as a dot, revealing relationships between the two variables.

6-8

Histogram

A histogram is a bar chart of a frequency distribution where bars represent count or density of data within consecutive equal-width intervals (bins).

6-8

Box Plot

A box plot displays the five-number summary (minimum, Q1, median, Q3, maximum) of a data set using a box and whiskers.

6-8

Quartiles

Quartiles divide an ordered data set into four equal parts: Q1 is the 25th percentile, Q2 is the median (50th), and Q3 is the 75th percentile.

6-8

Interquartile Range

The interquartile range (IQR) is $Q3 - Q1$ โ€” the spread of the middle 50% of the data, resistant to outliers.

6-8

Data (Abstract)

Data is a collection of recorded observations or measurements used to describe, analyze, or make inferences about a phenomenon or population.

3-5

Measurement

Measurement is the process of assigning numerical values to attributes of objects or events according to a defined rule or scale.

3-5

Variability

Variability is the degree to which data points in a set differ from each other and from the center of the distribution.

6-8

Noise

Noise is random variation in data that is not explained by the underlying pattern or model โ€” the unpredictable fluctuations around the true signal.

6-8

Signal vs Noise

Distinguishing meaningful patterns (signal) from random variation (noise) in data.

6-8

Distribution (Intuition)

A distribution describes how data values are spread out across their range โ€” which values occur, how often, and whether the data is symmetric or skewed.

6-8

Center vs Spread

Center versus spread describes two complementary aspects of any data distribution: center (mean, median) tells you where the typical value lies, while spread (range, IQR, standard deviation) tells you how much the values vary around that center.

6-8

Outliers (Deep)

An outlier is a data value that lies unusually far from most other values, potentially indicating measurement error, a rare event, or an important exception.

6-8

Dependence (Statistical)

When the probability of one event changes based on whether another event occurred.

6-8

Causation

Causation exists when one variable directly produces or influences a change in another variable โ€” distinct from mere correlation or association.

9-12

Sampling Bias

Sampling bias occurs when the method of selecting a sample systematically over- or under-represents certain groups relative to their actual proportion in the population.

6-8

Representativeness

A sample is representative if its characteristics (distribution of key variables) closely match those of the population it is meant to represent.

6-8

Law of Large Numbers (Intuition)

As sample size increases, the sample average approaches the true population average.

9-12

Prediction

A prediction is a model-based estimate of an unknown or future value, accompanied by a measure of confidence or uncertainty.

6-8

Model Fit (Intuition)

Model fit describes how closely a statistical model's predictions match the observed data โ€” measured by residuals, $R^2$, or loss functions.

9-12

Overfitting (Intuition)

Overfitting occurs when a model learns the noise in training data instead of just the underlying pattern, performing well on training data but poorly on new data.

9-12

Underfitting (Intuition)

Underfitting occurs when a model is too simple to capture the true pattern in the data, performing poorly on both training data and new data.

9-12

Data Visualization

Data visualization is the use of graphs, charts, and other visual representations to communicate patterns, trends, and relationships in data.

3-5

Misleading Graphs

A misleading graph is a data visualization that distorts the true pattern through truncated axes, unequal intervals, cherry-picked data, or manipulated scales.

6-8

Scale Distortion

Scale distortion occurs when a graph's axis does not start at zero or uses inconsistent intervals, making small differences appear large or large differences appear small.

6-8

Aggregation

Aggregation is the process of combining many individual data values into a single summary statistic such as a sum, mean, count, or proportion.

6-8

Normalization (Statistics)

Normalization rescales data to a standard range or distribution โ€” such as $[0,1]$ or zero mean and unit variance โ€” to make different variables comparable.

6-8

Proportional Data

Proportional data expresses quantities as fractions or percentages of a whole, enabling fair comparison across groups of different sizes.

6-8

Comparative Statistics

Comparative statistics involves using statistical measures to compare two or more groups, data sets, or distributions.

6-8

Probabilistic Thinking

Probabilistic thinking is the habit of reasoning about uncertain outcomes in terms of likelihood, expected value, and distributions rather than certainties.

6-8

Sampling Distribution

The probability distribution of a statistic (such as the sample mean) computed from all possible random samples of the same size drawn from a population.

9-12

Central Limit Theorem

For sufficiently large sample size ($n \geq 30$ as a rule of thumb), the sampling distribution of the sample mean is approximately normal with mean $\mu$ and standard deviation $\frac{\sigma}{\sqrt{n}}$, regardless of the shape of the population distribution.

9-12

Confidence Interval

A range of values, computed from sample data, that is likely to contain the true population parameter with a specified level of confidence.

9-12

Margin of Error

The maximum expected difference between the sample statistic and the true population parameter; it is half the width of a confidence interval.

9-12

Hypothesis Testing

A systematic method to decide whether sample data provides enough evidence to reject a claim (null hypothesis) about a population parameter.

9-12

P-Value

The probability of observing a test statistic at least as extreme as the one computed from the sample data, assuming the null hypothesis $H_0$ is true.

9-12

Type I and Type II Errors

Type I error ($\alpha$): rejecting $H_0$ when it is actually true (false positive). Type II error ($\beta$): failing to reject $H_0$ when it is actually false (false negative).

9-12

Experimental Design

The deliberate planning of a study in which the researcher imposes treatments on subjects and measures responses, using control groups, randomization, replication, and (where possible) blinding to establish cause-and-effect relationships.

6-8

Observational vs Experimental Studies

An observational study records data without imposing treatments, while an experiment deliberately manipulates a variable. Only experiments with random assignment can establish causation; observational studies can only show association.

6-8

Sampling Methods

Systematic approaches for selecting a subset of individuals from a population. The main probability methods are: simple random sample (SRS), stratified random sample, cluster sample, and systematic sample. Convenience sampling is a non-probability method that is generally biased.

6-8

Chi-Square Test

A family of hypothesis tests that use the chi-square statistic to compare observed frequencies to expected frequencies. The three main types are: goodness-of-fit (does data match a claimed distribution?), test of independence (are two categorical variables related?), and test of homogeneity (do different populations have the same distribution?).

9-12

Least Squares Regression Line

The unique straight line $\hat{y} = a + bx$ that minimizes the sum of squared vertical distances (residuals) between the observed data points and the line.

9-12

Residuals

The difference between an observed value and its predicted value from a regression model: $\text{residual} = y - \hat{y}$ (observed minus predicted).

9-12

Coefficient of Determination

The proportion of the total variation in the response variable $y$ that is explained by the linear relationship with the explanatory variable $x$. It equals the square of the correlation coefficient: $r^2$.

9-12

Inference for Regression

Using hypothesis tests and confidence intervals to draw conclusions about the true population slope $\beta_1$ of the linear relationship $y = \beta_0 + \beta_1 x + \varepsilon$, based on sample data.

9-12

Power of a Test

The probability that a hypothesis test correctly rejects a false null hypothesis. Power $= P(\text{reject } H_0 \mid H_0 \text{ is false}) = 1 - \beta$, where $\beta$ is the probability of a Type II error.

9-12

Paired t-Test

A hypothesis test for the mean difference in a paired (matched) data design, where each subject provides two related measurements. The test analyzes the differences $d_i = x_{1i} - x_{2i}$ as a single sample.

9-12

Two-Sample Tests

Hypothesis tests and confidence intervals for comparing parameters (means or proportions) of two independent populations. The two-sample t-test compares means; the two-proportion z-test compares proportions.

9-12

Mean Absolute Deviation

The average distance between each data value and the mean of the data set. Calculated by finding the mean, computing the absolute value of each deviation from the mean, and averaging those absolute deviations.

6-8

Two-Way Tables

A table that displays frequencies for two categorical variables simultaneously, organized with one variable in rows and the other in columns. It shows joint frequencies (individual cells), marginal frequencies (row/column totals), and enables calculation of conditional frequencies.

6-8