๐Ÿ“Š

Statistics Core

65 concepts in Statistics

Statistics is the science of learning from data. This core topic covers the entire statistical investigation cycle: formulating questions, collecting data through surveys and experiments, organizing and analyzing data with numerical summaries and visualizations, and drawing conclusions while acknowledging uncertainty. Students learn descriptive statistics โ€” measures of center, spread, and shape โ€” as well as the foundations of inferential statistics, including sampling distributions, confidence intervals, and hypothesis testing. Probability serves as the mathematical backbone, connecting data patterns to theoretical models. Particular attention is given to recognizing bias in data collection, understanding correlation versus causation, and evaluating statistical claims encountered in media and research. In an era of big data and data-driven decisions, statistical literacy is essential for informed citizenship, scientific inquiry, and professional competence across virtually every field.

Suggested learning path: Begin with exploratory data analysis and graphical displays, then study probability fundamentals, and progress to sampling distributions, confidence intervals, and basic hypothesis testing.

Data Collection

The systematic process of gathering information (data) to answer questions or learn about a topic.

Prerequisites:
counting

Data Representation

Organizing and displaying data in ways that make patterns and information easier to see and understand.

Prerequisites:
data collection
counting

Bar Graph

A graph that uses rectangular bars of different heights or lengths to compare quantities across categories.

Prerequisites:
data representation
counting

Line Graph

A graph that uses points connected by lines to show how a quantity changes over time or across a continuous variable.

Prerequisites:
data representation
coordinate plane

Mean as Fair Share

The mean (average) represents what each person would get if the total were divided equally among everyone.

Prerequisites:
addition
division
equal sharing

Data Variability

How much the values in a data set are spread out or clustered together around the center.

Prerequisites:
mean fair share
number comparison

Pictograph

A graph that uses pictures or symbols to represent data, where each symbol stands for a certain number of items.

Prerequisites:
counting
skip counting

Line Plot (Dot Plot)

A diagram showing data values as marks (usually X's or dots) above their values on a number line.

Prerequisites:
number line
counting

Mode

The value that appears most often in a data set. A set can have no mode, one mode, or multiple modes.

Prerequisites:
counting
comparison

Range

The difference between the maximum and minimum values in a data set, measuring overall spread.

Prerequisites:
subtraction
comparison

Tally Chart

A table that uses tally marks (lines) to count and organize data, with every fifth mark crossing the previous four.

Prerequisites:
counting
skip counting

Frequency Table

A table that records how often each value or category occurs in a data set, organizing raw data into a clear summary.

Prerequisites:
counting
tally chart

Categorical Data

Data that can be sorted into groups or categories, like colors, types, or names, rather than measured with numbers.

Prerequisites:
classification
counting

Statistical Question

A question that anticipates variability in answers - it can't be answered with a single number because different data points will give different responses.

Prerequisites:
questioning
data collection

Median

The middle value when data is arranged in order. Half the values are above it, half below.

Prerequisites:
ordering numbers
mean fair share

Mean vs Median

Mean and median are both measures of center but respond differently to extreme values (outliers).

Prerequisites:
mean
median intro
outliers

Spread vs Center

Center describes where the 'middle' of data lies; spread describes how far data extends from that center.

Prerequisites:
mean
variability intro

Correlation

A statistical relationship between two variables where changes in one are associated with changes in the other.

Prerequisites:
scatter plot
variables

Correlation vs Causation

Correlation shows two variables move together; causation means one actually makes the other change. Correlation doesn't prove causation.

Prerequisites:
correlation intro
variables

Sampling Bias

When a sample is collected in a way that makes some members of the population more likely to be included than others, leading to misleading conclusions.

Prerequisites:
data collection
population vs sample

Basic Probability

The chance or likelihood that an event will occur, expressed as a number between 0 (impossible) and 1 (certain).

Prerequisites:
fractions
ratios

Histogram

A graph that groups numerical data into ranges (bins) and shows the frequency of values in each range using bars that touch.

Prerequisites:
bar graph
data ranges

Box Plot

A visual display showing the five-number summary: minimum, Q1, median, Q3, and maximum, often with outliers marked separately.

Prerequisites:
median intro
quartiles

Standard Deviation

A measure of how spread out data values are from the mean, calculated as the typical distance from the average.

Prerequisites:
mean
variability intro

Misleading Graphs

Graphs can distort data through tricks like truncated axes, inconsistent scales, or cherry-picked time ranges to create false impressions.

Prerequisites:
bar graph
line graph
scaling

Experimental Design

The careful planning of experiments to establish cause-and-effect relationships by controlling variables and using comparison groups.

Prerequisites:
correlation vs causation
variables

Two-Way Tables

A table that displays the frequency of data categorized by two different variables, allowing comparison across groups.

Prerequisites:
data representation
categorical data

Random Sampling

Selecting individuals from a population where every member has an equal chance of being chosen.

Prerequisites:
sampling bias
population vs sample

Dot Plot

A statistical chart using dots to display the frequency of different values, similar to a line plot but often used for larger datasets.

Prerequisites:
line plot
frequency

Quartiles

Values that divide ordered data into four equal parts: $Q_1$ (25th percentile), $Q_2$ (median, 50th), and $Q_3$ (75th percentile).

Prerequisites:
median intro
ordering numbers

Interquartile Range (IQR)

The range of the middle 50% of data, calculated as $Q_3 - Q_1$. It measures spread while ignoring extreme values.

Prerequisites:
quartiles
range stat

Mean Absolute Deviation (MAD)

The average distance of data points from the mean, ignoring whether they're above or below.

Prerequisites:
mean
absolute value

Scatter Plot

A graph that uses dots to show the relationship between two numerical variables, with each dot representing one data point.

Prerequisites:
coordinate plane
two variable data

Distribution Shape

The overall pattern of how data values are spread, including whether the distribution is symmetric, skewed left, skewed right, uniform, or bimodal.

Prerequisites:
histogram
bar graph

Population vs Sample

Population is the entire group you want to study; sample is the smaller subset you actually measure to learn about the population.

Prerequisites:
data collection
set concept

Theoretical Probability

The expected probability of an event based on mathematical reasoning about equally likely outcomes, without conducting experiments.

Prerequisites:
probability basic
fractions

Experimental Probability

The probability of an event based on actual experimental data: the number of times the event occurred divided by total trials.

Prerequisites:
probability basic
data collection

Sample Space

The complete set of all possible outcomes for a probability experiment, listed without repetition.

Prerequisites:
counting
sets

Compound Events

Events made up of two or more simple events, calculated using multiplication (for 'and') or addition (for 'or').

Prerequisites:
probability basic
sample space

Relative Frequency

The fraction or percentage of times a value occurs out of the total number of observations.

Prerequisites:
fractions
percentages
frequency

Normal Distribution

A symmetric, bell-shaped probability distribution where most data clusters around the mean, with probabilities decreasing symmetrically toward the tails.

Prerequisites:
distribution shape
standard deviation intro

Z-Score (Standard Score)

The number of standard deviations a value is from the mean: $z = \frac{x - \mu}{\sigma}$.

Prerequisites:
standard deviation intro
mean

Percentiles

Values that divide a distribution into 100 equal parts. The nth percentile is the value below which n% of data falls.

Prerequisites:
quartiles
ordering numbers

Sampling Distribution

The probability distribution of a statistic (like the mean) calculated from all possible samples of a given size from a population.

Prerequisites:
population vs sample
mean
standard deviation intro

Central Limit Theorem

For large enough samples, the sampling distribution of the mean is approximately normal, regardless of the population distribution's shape.

Prerequisites:
sampling distribution
normal distribution

Standard Error

The standard deviation of a sampling distribution, measuring how much a sample statistic typically varies from the true population parameter.

Prerequisites:
standard deviation intro
sampling distribution

Confidence Interval

A range of values, calculated from sample data, that is likely to contain the true population parameter with a specified level of confidence.

Prerequisites:
standard error
sampling distribution

Margin of Error

The maximum expected difference between a sample statistic and the population parameter, typically expressed as $\pm$ a value.

Prerequisites:
confidence interval
standard error

Linear Regression

A statistical method for modeling the relationship between variables by fitting a line that minimizes the sum of squared distances from data points to the line.

Prerequisites:
scatter plot
correlation intro
slope intercept

Line of Best Fit

The straight line that best represents the trend in a scatter plot, minimizing the overall distance between the line and all data points.

Prerequisites:
scatter plot
slope intercept

Residuals

The differences between observed data values and the values predicted by a model (actual - predicted).

Prerequisites:
linear regression
prediction

R-Squared (Coefficient of Determination)

The proportion of variance in the dependent variable that is explained by the independent variable(s) in a regression model, ranging from 0 to 1.

Prerequisites:
linear regression
variance

Outlier Detection

Methods for identifying data points that are unusually far from the rest, using techniques like IQR rule, z-scores, or visual inspection.

Prerequisites:
interquartile range
z score

Observational vs Experimental Studies

Observational studies observe subjects without manipulation; experiments deliberately assign treatments to establish causation.

Prerequisites:
experimental design
correlation vs causation

Confounding Variables

A variable that influences both the independent and dependent variables, creating a spurious association that can be mistaken for causation.

Prerequisites:
correlation vs causation
variables

Statistical Simulation

Using random number generation to model real-world processes and estimate probabilities or outcomes that are difficult to calculate theoretically.

Prerequisites:
probability basic
random sampling

Law of Large Numbers

As the number of trials increases, the experimental probability (sample average) converges to the theoretical probability (population mean).

Prerequisites:
probability basic
mean

Expected Value

The long-run average outcome of a random process, calculated as the sum of each outcome times its probability.

Prerequisites:
probability basic
weighted average

Hypothesis Testing

A formal procedure for using sample data to decide between two competing claims (hypotheses) about a population parameter.

Prerequisites:
sampling distribution
standard error
probability basic

P-Value

The probability of observing results at least as extreme as the actual data, assuming the null hypothesis is true.

Prerequisites:
hypothesis testing
probability basic
sampling distribution

Statistical Significance

A result is statistically significant when the p-value falls below a predetermined threshold ($\alpha$), typically 0.05, suggesting the observed effect is unlikely due to chance alone.

Prerequisites:
p value
hypothesis testing

Correlation Coefficient

A number between โˆ’1 and 1 that measures the strength and direction of the linear relationship between two variables.

Prerequisites:
correlation intro
line of best fit

Weighted Average

An average in which different values contribute unequally based on their assigned weights.

Prerequisites:
mean fair share
stat expected value

Empirical Rule

In a normal distribution: ~68% of data falls within 1ฯƒ, ~95% within 2ฯƒ, and ~99.7% within 3ฯƒ of the mean.

Prerequisites:
stat normal distribution

Skewness

A measure of the asymmetry of a distribution โ€” how much it leans to one side of the mean.

Prerequisites:
distribution shape