Data Quality Concepts

2 concepts ยท Grades 6-8, 9-12

This family view narrows the full statistics map to one connected cluster. Read it from left to right: earlier nodes support later ones, and dense middle sections usually mark the concepts that hold the largest share of future work together.

Use the graph to plan review, then use the full concept list below to open precise pages for definitions, examples, and related content.

Concept Dependency Graph

Concepts flow left to right, from foundational to advanced. Hover to highlight connections. Click any concept to learn more.

Connected Families

Data Quality concepts have 7 connections to other families.

All Data Quality Concepts

Outlier Detection

9-12

Outlier detection is the process of identifying data points that are unusually far from the rest of the dataset, using techniques like the IQR rule, z-scores, or visual inspection of box plots and scatter plots. These anomalous values may indicate measurement errors, data entry mistakes, or genuinely extreme observations.

"Outliers are data points that don't fit the pattern. A 7-foot student in a class of average heights, or a \$10 million house in a neighborhood of \$300k homes. They may be errors or genuinely unusual."

Why it matters: Outliers can distort statistics like the mean and standard deviation, and break regression models. Detecting them lets you investigate whether they are errors to fix, special cases to study separately, or genuine extremes that reveal important information.

Sampling Bias

6-8

Sampling bias occurs when a sample is collected in a way that systematically makes some members of the population more likely to be included than others, producing results that do not accurately represent the full population and leading to misleading conclusions.

"Asking only your friends about favorite music doesn't tell you what the whole school thinks - your friends probably have similar tastes! That's bias. A good sample is like a well-shuffled deck: everyone has an equal chance of being picked."

Why it matters: Biased samples lead to wrong conclusions in election polls, medical research, and market surveys. The famous 1936 Literary Digest poll predicted the wrong presidential winner because it sampled from phone and car owners, missing lower-income voters entirely.