- Home
- /
- Statistics
- /
- Relationships And Regression
Relationships And Regression
11 concepts in Statistics
Relationships and regression focus on how two variables move together and how carefully that movement should be interpreted. Students read two-way tables, relative frequencies, conditional relative frequencies, scatter plots, lines of best fit, regression models, residuals, and correlation coefficients. They learn to spot association, quantify it, and model it without leaping too quickly to causation. A central lesson in this topic is that strong patterns can still be misleading if the wrong variables were measured, if lurking variables were ignored, or if a fitted model is treated like a proof. These ideas are essential for interpreting studies, evaluating claims in media, and understanding how statistical models are used in science, economics, and social research.
Suggested learning path: Start with two-way tables and relative frequencies, then move to scatter plots and association before fitting lines, reading residuals, and distinguishing correlation from causal explanation.
Two-Way Tables
A two-way table (contingency table) displays the frequency of data categorized by two different categorical variables simultaneously, with one variable in rows and the other in columns, allowing comparison of distributions across groups.
Relative Frequency
Relative frequency is the fraction or percentage of times a value occurs out of the total number of observations. It converts raw counts into proportions, enabling fair comparisons between groups of different sizes.
Conditional Relative Frequency
Conditional relative frequency is the proportion of cases in one group that also belong to another category, measured within a chosen row or column total of a two-way table. Joint and marginal relative frequencies describe the cell shares and row or column totals that support this calculation.
Correlation
Correlation is a statistical relationship between two variables where changes in one are associated with changes in the other. Positive correlation means both increase together; negative correlation means one increases as the other decreases; no correlation means no consistent pattern.
Correlation vs Causation
Correlation shows that two variables move together in some pattern; causation means one variable actually makes the other change. Observing a correlation does not prove causation because a hidden third variable (confounder) may be driving both.
Scatter Plot
A graph that plots pairs of numerical values as dots on a coordinate plane, revealing the relationship between two variables.
Line of Best Fit
The line of best fit (trend line) is the straight line that best represents the overall trend in a scatter plot by minimizing the sum of squared vertical distances between the line and all data points. Its equation enables predictions for new x-values.
Linear Regression
Linear regression is a statistical method for modeling the relationship between a dependent variable and one or more independent variables by fitting a straight line that minimizes the sum of squared distances from data points to the line (least squares method).
Residuals
A residual is the difference between an observed data value and the value predicted by a statistical model, calculated as $\text{residual} = y_{\text{observed}} - y_{\text{predicted}}$. Positive residuals mean the model underestimated; negative residuals mean it overestimated.
R-Squared (Coefficient of Determination)
R-squared (the coefficient of determination) is the proportion of variance in the dependent variable that is explained by the independent variable(s) in a regression model. It ranges from 0 to 1, where 0 means the model explains none of the variability and 1 means it explains all of it.
Correlation Coefficient
The correlation coefficient (Pearson's r) is a number between โ1 and 1 that measures both the strength and direction of the linear relationship between two quantitative variables. A value of 1 indicates a perfect positive linear relationship, โ1 a perfect negative linear relationship, and 0 no linear relationship at all.