linear_regression() performs bivariate or multiple linear regression
with SPSS-compatible output. Wraps stats::lm() and adds standardized
coefficients (Beta), a formatted ANOVA table, and a model summary matching
SPSS REGRESSION output.
Supports two interface styles:
Formula interface:
linear_regression(data, life_satisfaction ~ age + education)SPSS-style:
linear_regression(data, dependent = life_satisfaction, predictors = c(age, education))
Usage
linear_regression(
data,
formula = NULL,
dependent = NULL,
predictors = NULL,
weights = NULL,
use = c("listwise", "pairwise"),
standardized = TRUE,
conf.level = 0.95
)Arguments
- data
Your survey data (a data frame or tibble). If grouped (via
dplyr::group_by()), separate regressions are run for each group.- formula
A formula specifying the model (e.g.,
y ~ x1 + x2). If provided,dependentandpredictorsare ignored.- dependent
The dependent variable (unquoted). Used with
predictorswhen no formula is given.- predictors
Predictor variable(s) (unquoted, supports tidyselect). Used with
dependentwhen no formula is given.- weights
Optional survey weights (unquoted variable name). When specified, weighted least squares (WLS) is used, matching SPSS WEIGHT BY.
- use
How to handle missing data:
"listwise"(default) drops any case with a missing value on any variable (matching SPSS /MISSING LISTWISE)."pairwise"computes the regression from a pairwise covariance/correlation matrix, retaining more cases (matching SPSS /MISSING PAIRWISE).- standardized
Logical. If
TRUE(default), standardized coefficients (Beta) are calculated and included in the output.- conf.level
Confidence level for coefficient intervals (default 0.95).
Value
An object of class "linear_regression" containing:
- coefficients
Tibble with B, Std.Error, Beta, t, p, CI_lower, CI_upper
- model_summary
List with R, R_squared, adj_R_squared, std_error
- anova
Tibble with Sum of Squares, df, Mean Square, F, Sig.
- descriptives
Tibble with Mean, Std.Deviation, N for all variables
- model
The underlying
lmobject- formula
The formula used
- n
Sample size (listwise complete cases)
- dependent
Name of the dependent variable
- predictor_names
Names of predictor variables
- weighted
Logical indicating whether weights were used
- weight_name
Name of the weight variable (or NULL)
Use summary() for the full SPSS-style output with toggleable sections.
Details
Understanding the Results
The output includes four sections matching SPSS REGRESSION output:
Model Summary: R, R-squared, Adjusted R-squared, and Standard Error of the Estimate. R-squared tells you how much variance in the dependent variable is explained by the predictors.
ANOVA: Tests whether the overall model is significant. A significant F-test means at least one predictor matters.
Coefficients: B (unstandardized), Beta (standardized), t-value, p-value, and confidence intervals for each predictor.
Descriptives: Mean, SD, and N for all variables in the model.
Interpreting coefficients:
B (unstandardized): For each 1-unit increase in the predictor, the dependent variable changes by B units
Beta (standardized): Allows comparison across predictors with different scales. Larger absolute Beta = stronger effect
p-value: Values below 0.05 indicate statistically significant predictors
When to Use This
Use linear_regression() when:
Your dependent variable is continuous (e.g., income, satisfaction score)
You want to predict an outcome from one or more predictors
You need standardized coefficients to compare predictor importance
For binary outcomes (yes/no, 0/1), use logistic_regression instead.
Technical Details
Missing Data: By default, listwise deletion is used (matching SPSS
REGRESSION /MISSING LISTWISE). Set use = "pairwise" to match SPSS
/MISSING PAIRWISE, which computes the regression from a pairwise
covariance matrix. Pairwise deletion retains more cases and produces
results closer to SPSS output when data has varying patterns of missingness.
Weights: When weights are specified, they are treated as frequency
weights (matching SPSS WEIGHT BY behavior). The model is fitted using weighted
least squares via lm(weights = ...).
Standardized Coefficients: Beta = B * (SD_x / SD_y). This matches the SPSS standardized coefficient output. Not available for the intercept.
Grouped Analysis: When data is grouped via
dplyr::group_by(), a separate regression is run for each group
(matching SPSS SPLIT FILE BY).
See also
logistic_regression for binary outcome variables.
describe for checking variable distributions before regression.
pearson_cor for checking bivariate correlations.
summary.linear_regression for detailed output with toggleable sections.
Other regression:
logistic_regression()
Examples
library(dplyr)
data(survey_data)
# Bivariate regression
linear_regression(survey_data, life_satisfaction ~ age)
#> Linear Regression: life_satisfaction ~ age
#> R2 = 0.001, adj.R2 = 0.000, F(1, 2419) = 2.00, p = 0.158 , N = 2421
# Multiple regression
linear_regression(survey_data, income ~ age + education + life_satisfaction)
#> Linear Regression: income ~ age + education + life_satisfaction
#> R2 = 0.471, adj.R2 = 0.470, F(3, 2111) = 625.48, p < 0.001 ***, N = 2115
# SPSS-style interface
linear_regression(survey_data,
dependent = life_satisfaction,
predictors = c(trust_government, trust_media, trust_science))
#> Linear Regression: life_satisfaction ~ trust_government + trust_media + trust_science
#> R2 = 0.002, adj.R2 = 0.000, F(3, 2062) = 1.16, p = 0.322 , N = 2066
# Weighted regression
linear_regression(survey_data, life_satisfaction ~ age, weights = sampling_weight)
#> Linear Regression: life_satisfaction ~ age [Weighted]
#> R2 = 0.001, adj.R2 = 0.000, F(1, 2435) = 2.08, p = 0.150 , N = 2437
# Grouped by region
survey_data |>
dplyr::group_by(region) |>
linear_regression(life_satisfaction ~ age)
#> Linear Regression: life_satisfaction ~ age [Grouped: region]
#> region = East: R2 = 0.002, adj.R2 = -0.000, F(1, 463) = 0.88, p = 0.350 , N = 465
#> region = West: R2 = 0.001, adj.R2 = 0.000, F(1, 1954) = 1.20, p = 0.274 , N = 1956
# --- Three-layer output ---
result <- linear_regression(survey_data, life_satisfaction ~ age + income)
result # compact one-line overview
#> Linear Regression: life_satisfaction ~ age + income
#> R2 = 0.201, adj.R2 = 0.200, F(2, 2112) = 265.60, p < 0.001 ***, N = 2115
summary(result) # full detailed SPSS-style output
#>
#> Linear Regression Results
#> -------------------------
#> - Formula: life_satisfaction ~ age + income
#> - Method: ENTER (all predictors)
#> - N: 2115
#>
#> Model Summary
#> ------------------------------------------------------------
#> R 0.448
#> R Square 0.201
#> Adjusted R Square 0.200
#> Std. Error of Estimate 1.026
#> ------------------------------------------------------------
#>
#> ANOVA
#> ------------------------------------------------------------------------------
#> Source Sum of Squares df Mean Square F Sig.
#> ------------------------------------------------------------------------------
#> Regression 559.609 2 279.804 265.598 0.000 ***
#> Residual 2224.965 2112 1.053
#> Total 2784.574 2114
#> ------------------------------------------------------------------------------
#>
#> Coefficients
#> ----------------------------------------------------------------------------------------
#> Term B Std.Error Beta t Sig.
#> ----------------------------------------------------------------------------------------
#> (Intercept) 2.321 0.092 25.237 0.000 ***
#> age -0.001 0.001 -0.010 -0.508 0.611
#> income 0.000 0.000 0.448 23.037 0.000 ***
#> ----------------------------------------------------------------------------------------
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
summary(result, collinearity = FALSE) # hide VIF/Tolerance
#>
#> Linear Regression Results
#> -------------------------
#> - Formula: life_satisfaction ~ age + income
#> - Method: ENTER (all predictors)
#> - N: 2115
#>
#> Model Summary
#> ------------------------------------------------------------
#> R 0.448
#> R Square 0.201
#> Adjusted R Square 0.200
#> Std. Error of Estimate 1.026
#> ------------------------------------------------------------
#>
#> ANOVA
#> ------------------------------------------------------------------------------
#> Source Sum of Squares df Mean Square F Sig.
#> ------------------------------------------------------------------------------
#> Regression 559.609 2 279.804 265.598 0.000 ***
#> Residual 2224.965 2112 1.053
#> Total 2784.574 2114
#> ------------------------------------------------------------------------------
#>
#> Coefficients
#> ----------------------------------------------------------------------------------------
#> Term B Std.Error Beta t Sig.
#> ----------------------------------------------------------------------------------------
#> (Intercept) 2.321 0.092 25.237 0.000 ***
#> age -0.001 0.001 -0.010 -0.508 0.611
#> income 0.000 0.000 0.448 23.037 0.000 ***
#> ----------------------------------------------------------------------------------------
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
