- Home
- /
- Statistics
- /
- relationships and regression
- /
- Linear Regression
Linear regression is a statistical method for modeling the relationship between a dependent variable and one or more independent variables by fitting a straight line that minimizes the sum of squared distances from data points to the line (least squares method). Regression is one of the most widely used statistical tools.
Definition
Linear regression is a statistical method for modeling the relationship between a dependent variable and one or more independent variables by fitting a straight line that minimizes the sum of squared distances from data points to the line (least squares method).
๐ก Intuition
Given scattered points, draw the 'best' line through them. 'Best' means the line that's closest to all points on average. This line lets you predict Y from X.
๐ฏ Core Idea
Linear regression finds the line that minimizes total squared prediction errors (least squares). The slope tells you how much Y changes per unit increase in X.
Example
\text{weight} = 4.5 \times \text{height} - 180.
For a 70" tall person, predict 4.5(70) - 180 = 135 lbs.
Notation
\hat{y} is the predicted value, b_0 is the y-intercept, b_1 is the slope, x is the independent variable, and the residual is e_i = y_i - \hat{y}_i.
๐ Why It Matters
Regression is one of the most widely used statistical tools. It powers predictions in science, business, and machine learning.
๐ญ Hint When Stuck
First, plot the data on a scatter plot to verify a linear pattern exists. Then use the least-squares formulas to find the slope and intercept of the best-fit line. Finally, check the residual plot for random scatter (no pattern) to confirm the linear model is appropriate.
Formal View
Related Concepts
Compare With Similar Concepts
๐ง Common Stuck Point
Students extrapolate regression lines far beyond the data range. Predictions outside the observed data are unreliable because the linear relationship may not hold.
โ ๏ธ Common Mistakes
- Extrapolating beyond data range
- Assuming causation from regression
- Ignoring residual patterns
Frequently Asked Questions
What is Linear Regression in Statistics?
Linear regression is a statistical method for modeling the relationship between a dependent variable and one or more independent variables by fitting a straight line that minimizes the sum of squared distances from data points to the line (least squares method).
When do you use Linear Regression?
First, plot the data on a scatter plot to verify a linear pattern exists. Then use the least-squares formulas to find the slope and intercept of the best-fit line. Finally, check the residual plot for random scatter (no pattern) to confirm the linear model is appropriate.
What do students usually get wrong about Linear Regression?
Students extrapolate regression lines far beyond the data range. Predictions outside the observed data are unreliable because the linear relationship may not hold.
Prerequisites
How Linear Regression Connects to Other Ideas
To understand linear regression, you should first be comfortable with stat scatter plot, correlation intro and line of best fit. Once you have a solid grasp of linear regression, you can move on to residuals and r squared.