Skip to contents

Professional statistical analysis for survey data in R.

mariposa (Marburg Initiative for Political and Social Analysis) provides 76 functions for importing, managing, transforming, and analyzing survey data. Covers the full workflow from data import (SPSS, Stata, SAS, Excel) through label management, recoding, and standardization to statistical analysis with survey weights, grouped operations via dplyr::group_by(), and publication-ready output. All statistical results are validated against SPSS v29 for full reproducibility.

Installation

# Install from GitHub
devtools::install_github("YannickDiehl/mariposa")

Quick Start

library(mariposa)
library(dplyr)

# Load example survey data (2,500 respondents)
data(survey_data)

# Interactive HTML codebook in RStudio Viewer
codebook(survey_data)

# Descriptive statistics with survey weights
survey_data %>%
  describe(age, income, life_satisfaction, weights = sampling_weight)

# Frequency table
survey_data %>%
  frequency(education, weights = sampling_weight)

# Compare groups with t-test
survey_data %>%
  t_test(life_satisfaction, group = gender, weights = sampling_weight)

# Detailed SPSS-style output with summary()
survey_data %>%
  t_test(life_satisfaction, group = gender, weights = sampling_weight) %>%
  summary()

# Scale analysis workflow
reliability(survey_data, trust_government, trust_media, trust_science) %>%
  summary()    # item statistics, inter-item correlations

survey_data <- survey_data %>%
  mutate(m_trust = row_means(., trust_government, trust_media, trust_science))

# Regression
survey_data %>%
  linear_regression(life_satisfaction ~ age + income, weights = sampling_weight) %>%
  summary()    # coefficients, ANOVA table, diagnostics

Core Features

Statistical Functions

Category Functions Purpose
Data Import read_spss(), read_stata(), read_sas(), read_xlsx(), + 2 more Import SPSS, Stata, SAS, and Excel with tagged NA support
Data Export write_spss(), write_stata(), write_xpt(), write_xlsx() Export with full label and missing value roundtripping
Label Management var_label(), val_labels(), to_label(), set_na(), + 6 more Get/set labels, convert formats, declare missing values
Data Transformation rec(), to_dummy(), std(), center(), find_var() Recoding, dummy coding, standardization, centering
Descriptive describe(), frequency(), crosstab(), codebook() Summaries, distributions, and data dictionaries
T-Tests t_test() Mean comparisons (independent, paired, one-sample)
ANOVA oneway_anova(), factorial_anova(), ancova() One-way, multi-factor ANOVA, and ANCOVA with Type III SS
Non-parametric mann_whitney(), kruskal_wallis(), wilcoxon_test(), friedman_test(), binomial_test() Distribution-free tests
Exact tests chi_square(), fisher_test(), chisq_gof(), mcnemar_test() Categorical associations and exact tests
Correlation pearson_cor(), spearman_rho(), kendall_tau() Relationships between variables
Post-hoc tukey_test(), scheffe_test(), levene_test(), dunn_test(), pairwise_wilcoxon() Follow-up analyses (parametric and non-parametric)
Scale analysis reliability(), efa(), row_means(), row_sums(), row_count(), pomps() Cronbach’s Alpha, factor analysis, index construction
Regression linear_regression(), logistic_regression() Linear and logistic models with SPSS-style output
Effect sizes phi(), cramers_v(), goodman_gamma() Effect size measures for categorical data
Weighted stats w_mean(), w_median(), w_sd(), + 8 more Individual weighted statistics

Survey Weights Built-In

Every function handles survey weights correctly:

# Weighted mean, median, SD
survey_data %>%
  w_mean(age, income, weights = sampling_weight)

# Grouped weighted analysis
survey_data %>%
  group_by(region) %>%
  describe(satisfaction, weights = sampling_weight)

Tidyverse Integration

Full support for pipes and grouped operations:

survey_data %>%
  filter(age >= 18) %>%
  group_by(region) %>%
  t_test(life_satisfaction, group = gender, weights = sampling_weight)

Multi-Factor ANOVA & ANCOVA

# Factorial ANOVA with Type III SS
survey_data %>%
  factorial_anova(dv = income, between = c(gender, education),
                  weights = sampling_weight)

# ANCOVA with covariate adjustment
survey_data %>%
  ancova(dv = income, between = gender, covariate = age,
         weights = sampling_weight)

S3 Post-Hoc Methods

# Parametric: ANOVA → Tukey/Scheffe
result <- survey_data %>%
  oneway_anova(life_satisfaction, group = education, weights = sampling_weight)

result %>% tukey_test()    # Pairwise comparisons
result %>% levene_test()   # Variance homogeneity

# Non-parametric: Kruskal-Wallis → Dunn
kw_result <- survey_data %>%
  kruskal_wallis(life_satisfaction, group = education)

kw_result %>% dunn_test()  # Pairwise Dunn comparisons

Flexible Output: Compact & Detailed

Every analysis function provides two output levels. Typing the result name prints a compact one-line summary. Calling summary() produces the full SPSS-style output with all details. You can toggle individual sections on or off:

# Compact one-line summary (default)
survey_data %>%
  t_test(life_satisfaction, group = gender)

# Full detailed output
survey_data %>%
  t_test(life_satisfaction, group = gender) %>%
  summary()

# Toggle individual sections
survey_data %>%
  t_test(life_satisfaction, group = gender) %>%
  summary(effect_sizes = FALSE)

This works for all analysis functions — t_test(), oneway_anova(), chi_square(), pearson_cor(), reliability(), linear_regression(), and more.

SPSS Compatibility

Every function is validated against SPSS v29 across four scenarios: weighted/unweighted and grouped/ungrouped. If you’re migrating from SPSS, your results will match:

SPSS:

WEIGHT BY sampling_weight.
T-TEST GROUPS=gender(1 2)
  /VARIABLES=satisfaction.

mariposa:

survey_data %>%
  t_test(satisfaction, group = gender, weights = sampling_weight)

Documentation

Support

License

MIT - Yannick Diehl