Read the first worked example with the solution open so the structure is clear.
Try the practice problems before revealing each solution.
Use the related concepts and background knowledge badges if you feel stuck.
What to Focus On
Core idea:Hidden variables are influential factors left out of your model that can distort the relationships you do see.
Common stuck point:The procedure for hidden variables is the easy part; the trap is jumping from correlation to causation. Asking "Could an unmeasured third factor be driving both variables I'm relating?" first is what keeps a correct-looking calculation from being attached to the wrong concept.
Sense of Study hint:Ask: Could an unmeasured third factor be driving both variables I'm relating?
Worked Examples
Example 1
easy
A formula gives a car's stopping distance as d=0.044v2+0.75v (metres, km/h). Identify the hidden variables that this formula ignores.
Answer
Hidden: road friction, tyre condition, variable reaction time, road slope
Hidden variable 2: tyre quality and pressure — affects grip.
3
Hidden variable 3: driver reaction time variation — the 0.75v term assumes a fixed reaction time.
4
Hidden variable 4: slope of the road — braking downhill vs uphill is very different.
5
The formula is a simplified model valid only under assumed conditions.
Hidden variables are factors that influence the output but are not explicitly represented in the model. Identifying them reveals the model's limitations and conditions of validity.
Example 2
medium
The correlation between shoe size and reading ability in children appears strong. Identify the hidden variable and explain why correlation does not imply causation here.
Example 3
medium
A linear model y=3x holds in summer. In winter, the same data show y=1.5x. Identify the hidden variable and write the augmented model.
Example 4
medium
Two batches of cookies: batch 1 mean weight 30 g, batch 2 mean 32 g. Within each oven, batch 2 is heavier; across all ovens it is lighter. Identify the hidden grouping.
Example 5
hard
In the model y=b1x+b2z+ε with z=0.5x+u and true b1=1, b2=4, predict the omitted-variable bias when z is dropped.
Practice Problems
Try these problems on your own first, then open the solution to compare your method.
Example 1
easy
The area formula A=lw has hidden units. If l=5 m and w=3 m, what is A, and what hidden variable (units) must be tracked?
Example 2
medium
In the equation x+3=7, if x is constrained to be a natural number, solve it. If x must be a real number, what changes? Identify the hidden variable (domain).
Example 3
easy
Ice cream sales and drowning deaths both rise in summer. A student concludes ice cream causes drowning. What hidden variable explains both?
Example 4
easy
A model predicts a student's test score from hours studied but ignores sleep. Sleep affects scores. What kind of variable is sleep here?
Example 5
easy
Two cities have the same average temperature but one feels far hotter. What hidden variable likely explains the difference?
Example 6
easy
A coin appears biased: 7 heads in 10 flips. Before concluding bias, what hidden factor should you consider?
Example 7
easy
A store finds taller shelves sell more, so stocks taller shelves. Sales drop. What hidden variable might have driven the original pattern?
Example 8
easy
In the equation distance = rate x time, a runner's actual distance is shorter than predicted. What hidden variable could explain it?
Example 9
easy
A survey of gym members finds people who exercise are healthier. What hidden variable threatens the conclusion that exercise causes health?
Example 10
easy
A formula models a falling object's time using only height, ignoring air. For a feather it fails badly. What hidden variable matters?
Example 11
medium
Region A and Region B both show a hospital treatment with 80% survival overall, yet within both mild and severe cases Hospital X beats Hospital Y. Which hidden variable produces this reversal (Simpson's paradox)?
Example 12
medium
A linear fit of y on x has slope 2. After adding an omitted variable z (correlated with both), the slope of x drops to 0.5. What does this reveal about the original model?
Example 13
medium
A test for a rare disease (1% prevalence) is 90% accurate. A patient tests positive. The 'hidden' factor inflating false positives is base rate. What is the approximate probability the patient actually has the disease?
Example 14
medium
Two students score identically on a test, but one cheated. A model using only score predicts equal ability. What hidden variable invalidates the prediction?
Example 15
medium
Sales rose after an ad campaign, but it was also the holiday season. To isolate the ad's effect, what hidden variable must you control for?
Example 16
medium
A function f(x,y) is being studied by varying x only. Results look random. The hidden variable y also changes uncontrolled. What experimental fix isolates x's effect?
Example 17
medium
A company finds employees with bigger offices earn more, and concludes office size raises pay. Name the most likely hidden variable and the true causal direction.
Example 18
challenge
In a regression y = b0 + b1 x, the true model is y = b0 + b1 x + b2 z + e, with z = c x + u. Show that omitting z makes the estimated x-coefficient converge to b1 + b2 c, and state the condition under which omission causes no bias.
Example 19
challenge
A latent factor model has observed score s = a*ability + b*coaching, with coaching unobserved and correlated with ability (corr rho > 0). If a researcher uses s as a pure proxy for ability, in which direction is ability over- or under-stated for heavily coached students, and why?
Example 20
challenge
Across 5 years, a treatment's yearly success rate exceeds control's every year, yet pooled over all years control wins. Construct the hidden-variable condition (counts) that makes this possible and name the phenomenon.
Example 21
medium
A study finds students who use tutoring score lower. Before concluding tutoring hurts, what hidden variable likely explains the reverse-looking result?
Example 22
medium
Two factories report the same average defect rate, but one has wildly inconsistent daily rates. What hidden variable does a single average conceal?
Example 23
easy
A model predicts crop yield using only fertilizer amount, ignoring rainfall. In a drought year, the model fails. What hidden variable explains the failure?
Example 24
easy
Sales of sunscreen and the count of mosquito bites both peak in July. A blogger says sunscreen attracts mosquitoes. Name the hidden variable.
Example 25
easy
A scatterplot of x and y shows a clean positive trend. A friend insists x must cause y. Give one reason this conclusion can be wrong.
Example 26
easy
A car-pricing formula uses only mileage and year. Two cars match on both, yet one sells for far more. Name a likely hidden variable.
Example 27
easy
A weather app predicts hike difficulty using only distance. Two hikes are 5 km but one is twice as exhausting. Name a hidden variable.
Example 28
easy
In an experiment varying temperature on a chemical reaction, pressure is allowed to drift. What hidden variable should be controlled?
Example 29
medium
A simple regression y=a+bx fits perfectly on training data with b=4. Adding a measured variable z yields y=a′+0.5x+3.5z with z=x. Explain what was hidden.
Example 30
medium
A 2x2 table compares treatment A vs B by gender. Within each gender A wins; pooled, B wins. Which hidden variable produces the reversal?
Example 31
medium
Test of a new drug: success rate 70% in volunteers, 40% in the general population. Identify the hidden variable driving the gap.
Example 32
medium
A school's average SAT rises after admitting only students who scored above 1200 on a prep test. Name the hidden variable behind the rise.
Example 33
medium
A factory's defect rate drops after installing new lights. Production volume also doubles. Name a hidden variable that could explain the drop.
Example 34
medium
Children with bigger feet read better on average. Researchers should control for what hidden variable?
Example 35
medium
To test if exercise reduces cholesterol, randomize 200 participants to exercise or control. Why does randomization neutralize hidden variables?
Example 36
hard
In an observational study, smoking and lung cancer are correlated. List two hidden-variable explanations and one design that rules them out.
Example 37
hard
A regression of wages on years of education produces slope 0.10. After adding IQ, the slope drops to 0.06. Compute the share of the original slope attributable to IQ-correlated effects.
Example 38
hard
A test reports drug A is better in adults and in children separately, but worse overall. Construct counts (adult 90/100 vs 60/100, child 10/100 vs 30/100 for A vs B) and confirm.
Example 39
hard
A disease has prevalence 2%; a test has sensitivity 95% and specificity 90%. Compute the posterior probability of disease given a positive test.
Example 40
hard
A startup ad performs well in geo A and geo B in tests but flops at national scale. Identify two hidden variables that could explain it.
Example 41
challenge
In a structural equation y=α+βx+γu+ε with u unobserved and Cov(x,u)=σxu, derive the bias of the OLS estimator of β.
Example 42
challenge
Two regions show the same average household income $50,000, but region A has a Gini of 0.25 and region B has 0.55. What hidden variable does a single mean conceal?