Inference for Regression Formula

The Formula

t = \frac{b - \beta_{1,0}}{\text{SE}_b} \quad\text{where}\quad \text{SE}_b = \frac{s}{\sqrt{\sum(x_i - \bar{x})^2}}

When to use: You computed a sample regression line with slope b = 2.3. But is the true population slope actually different from zero? Maybe there's really no linear relationship and you just got a slope by chance. The regression t-test asks: 'Is my sample slope far enough from zero that it's unlikely to have occurred by random variation alone?'

Quick Example

Sample slope b = 2.3, \text{SE}_b = 0.8, n = 25. t = \frac{b - 0}{\text{SE}_b} = \frac{2.3}{0.8} = 2.875 \quad (df = 23) The p-value \approx 0.008 < 0.05, so reject H_0: \beta_1 = 0. There is evidence of a linear relationship.

Notation

b = sample slope, \beta_1 = population slope, \text{SE}_b = standard error of the slope, s = standard deviation of residuals, df = n - 2.

What This Formula Means

Using hypothesis tests and confidence intervals to draw conclusions about the true population slope \beta_1 of the linear relationship y = \beta_0 + \beta_1 x + \varepsilon, based on sample data.

You computed a sample regression line with slope b = 2.3. But is the true population slope actually different from zero? Maybe there's really no linear relationship and you just got a slope by chance. The regression t-test asks: 'Is my sample slope far enough from zero that it's unlikely to have occurred by random variation alone?'

Formal View

t = \frac{b - \beta_{1,0}}{\text{SE}_b} with df = n - 2 where \text{SE}_b = \frac{s}{\sqrt{\sum(x_i - \bar{x})^2}}; CI: b \pm t^* \cdot \text{SE}_b

Worked Examples

Example 1

medium
A regression output shows: slope b=2.5, SE_b=0.8, n=30. Test H_0: \beta=0 vs H_a: \beta \neq 0 at \alpha=0.05 using a t-test.

Solution

  1. 1
    Test statistic: t = \frac{b - \beta_0}{SE_b} = \frac{2.5 - 0}{0.8} = 3.125
  2. 2
    Degrees of freedom: df = n - 2 = 30 - 2 = 28
  3. 3
    Critical value: t^*_{0.025, 28} \approx 2.048 (two-tailed at \alpha=0.05)
  4. 4
    Since |t| = 3.125 > 2.048, reject H_0; the slope is significantly different from zero

Answer

t=3.125 > 2.048. Reject H_0. The slope is statistically significant at \alpha=0.05.
Testing whether the slope equals zero tests whether x is a useful predictor of y. Rejecting H_0: \beta=0 means the linear relationship exists (in the population). df = n-2 because two parameters (slope and intercept) are estimated.

Example 2

hard
Construct a 95% confidence interval for the slope \beta given: b=1.8, SE_b=0.5, n=25, and t^*_{0.025,23}=2.069.

Common Mistakes

  • Forgetting to check the conditions before performing inference—linearity, independence, normality of residuals, and equal variance must all be reasonable.
  • Using n - 1 degrees of freedom instead of n - 2—regression uses 2 parameters (a and b), so df = n - 2.
  • Interpreting a significant slope as proof of causation—regression inference tests for a linear association, but causation requires experimental design.

Why This Formula Matters

Computing a regression line is descriptive; regression inference tells you whether the relationship is statistically real or could be due to chance. This is how researchers establish that one variable genuinely predicts another.

Frequently Asked Questions

What is the Inference for Regression formula?

Using hypothesis tests and confidence intervals to draw conclusions about the true population slope \beta_1 of the linear relationship y = \beta_0 + \beta_1 x + \varepsilon, based on sample data.

How do you use the Inference for Regression formula?

You computed a sample regression line with slope b = 2.3. But is the true population slope actually different from zero? Maybe there's really no linear relationship and you just got a slope by chance. The regression t-test asks: 'Is my sample slope far enough from zero that it's unlikely to have occurred by random variation alone?'

What do the symbols mean in the Inference for Regression formula?

b = sample slope, \beta_1 = population slope, \text{SE}_b = standard error of the slope, s = standard deviation of residuals, df = n - 2.

Why is the Inference for Regression formula important in Math?

Computing a regression line is descriptive; regression inference tells you whether the relationship is statistically real or could be due to chance. This is how researchers establish that one variable genuinely predicts another.

What do students get wrong about Inference for Regression?

Students forget to check the conditions: (1) the residual plot should show no pattern, (2) residuals should be approximately normal, (3) the spread of residuals should be roughly constant across x.

What should I learn before the Inference for Regression formula?

Before studying the Inference for Regression formula, you should understand: linear regression lsrl, residuals, r squared, hypothesis testing, confidence interval.