- Home
- /
- Statistics
- /
- Center Spread And Distributions
Center Spread And Distributions
19 concepts in Statistics
Center, spread, and distributions help students move from “What are the data values?” to “What is this dataset like overall?” This topic covers measures of center such as mean, median, and mode; measures of spread such as range, mean absolute deviation, interquartile range, and standard deviation; and the language used to describe distributions, including shape, outliers, percentiles, z-scores, skewness, and the normal distribution. Students learn that no single summary is enough by itself: a mean without spread can hide instability, and a graph without context can hide what is typical. These ideas are the backbone of descriptive statistics and the bridge into probability models and inference.
Suggested learning path: Begin with mean, median, mode, and range, then study variability and quartiles before moving into standard deviation, distribution shape, percentiles, z-scores, and normal-distribution ideas.
Mean as Fair Share
The mean (average) represents what each person would get if the total were divided equally among everyone. It is calculated by adding all values and dividing by the count, giving a single number that summarizes the center of the data.
Median
The median is the middle value when all data points are arranged in order from smallest to largest. Half the values lie above it and half below. For an even number of values, the median is the average of the two middle values.
Mode
The mode is the value that appears most often in a data set. A set can have no mode (all values appear equally), one mode (unimodal), or multiple modes (bimodal or multimodal). It is the only measure of center that works for categorical data.
Range
The range is the difference between the maximum and minimum values in a data set, giving the simplest measure of overall spread. It tells you the total span of the data from lowest to highest in a single number.
Mean vs Median
Mean and median are both measures of center but respond differently to extreme values (outliers). The mean is pulled toward outliers because it uses every value in its calculation, while the median is resistant to outliers because it depends only on the middle position.
Spread vs Center
Center describes where the 'middle' of data lies; spread describes how far data extends from that center.
Data Variability
Data variability describes how much the values in a data set are spread out or clustered together around the center. High variability means values are widely scattered; low variability means they are tightly grouped near the average.
Quartiles
Quartiles are values that divide ordered data into four equal parts: $Q_1$ (25th percentile) marks the boundary below which 25% of data falls, $Q_2$ (the median, 50th percentile) splits the data in half, and $Q_3$ (75th percentile) marks the boundary below which 75% falls.
Interquartile Range (IQR)
The interquartile range (IQR) is the range of the middle 50% of data, calculated as $Q_3 - Q_1$. It measures spread while ignoring the top and bottom 25% of values, making it resistant to outliers.
Mean Absolute Deviation (MAD)
The Mean Absolute Deviation (MAD) is the average of the absolute distances between each data point and the mean of the dataset. It measures how spread out data values are from the center, with larger MAD values indicating more variability.
Standard Deviation
Standard deviation is a measure of how spread out data values are from the mean, representing the typical distance of data points from the average. A small standard deviation means data clusters tightly around the mean; a large one means data is widely spread.
Distribution Shape
Distribution shape describes the overall pattern of how data values are spread when displayed in a histogram or dot plot. Common shapes include symmetric (bell curve), skewed left, skewed right, uniform (all values equally common), and bimodal (two peaks).
Outlier Detection
Outlier detection is the process of identifying data points that are unusually far from the rest of the dataset, using techniques like the IQR rule, z-scores, or visual inspection of box plots and scatter plots. These anomalous values may indicate measurement errors, data entry mistakes, or genuinely extreme observations.
Percentiles
Percentiles are values that divide a ranked distribution into 100 equal parts. The $n$th percentile is the value below which $n\%$ of the data falls, telling you where a specific observation stands relative to the entire dataset.
Normal Distribution
The normal distribution (bell curve) is a symmetric, bell-shaped probability distribution where most data clusters around the mean, with probabilities decreasing symmetrically toward the tails. It is defined by two parameters: the mean and the standard deviation.
Z-Score (Standard Score)
A z-score tells you how many standard deviations a value is from the mean, calculated as $z = \frac{x - \mu}{\sigma}$. Positive z-scores are above the mean; negative z-scores are below. Z-scores allow comparison of values from different distributions.
Weighted Average
A weighted average is an average in which different values contribute unequally based on their assigned weights, reflecting the relative importance or frequency of each value. Unlike a simple average where all values count equally, a weighted average gives more influence to values with larger weights.
Empirical Rule
The empirical rule (also called the 68-95-99.7 rule) states that for a normal distribution, approximately 68% of data falls within one standard deviation of the mean, about 95% falls within two standard deviations, and roughly 99.7% falls within three standard deviations.
Skewness
A measure of how asymmetric a probability distribution is around its mean — positive skew tails right, negative skew tails left.