mariposa - Professional Survey Analysis in R • mariposa

Professional statistical analysis for survey data in R.

mariposa (Marburg Initiative for Political and Social Analysis) provides 76 functions for importing, managing, transforming, and analyzing survey data. Covers the full workflow from data import (SPSS, Stata, SAS, Excel) through label management, recoding, and standardization to statistical analysis with survey weights, grouped operations via dplyr::group_by(), and publication-ready output. All statistical results are validated against SPSS v29 for full reproducibility.

Installation

# Install from GitHub
devtools::install_github("YannickDiehl/mariposa")

Quick Start

library(mariposa)
library(dplyr)

# Load example survey data (2,500 respondents)
data(survey_data)

# Interactive HTML codebook in RStudio Viewer
codebook(survey_data)

# Descriptive statistics with survey weights
survey_data %>%
  describe(age, income, life_satisfaction, weights = sampling_weight)

# Frequency table
survey_data %>%
  frequency(education, weights = sampling_weight)

# Compare groups with t-test
survey_data %>%
  t_test(life_satisfaction, group = gender, weights = sampling_weight)

# Detailed SPSS-style output with summary()
survey_data %>%
  t_test(life_satisfaction, group = gender, weights = sampling_weight) %>%
  summary()

# Scale analysis workflow
reliability(survey_data, trust_government, trust_media, trust_science) %>%
  summary()    # item statistics, inter-item correlations

survey_data <- survey_data %>%
  mutate(m_trust = row_means(., trust_government, trust_media, trust_science))

# Regression
survey_data %>%
  linear_regression(life_satisfaction ~ age + income, weights = sampling_weight) %>%
  summary()    # coefficients, ANOVA table, diagnostics

Core Features

Statistical Functions

Category	Functions	Purpose
Data Import	`read_spss()`, `read_stata()`, `read_sas()`, `read_xlsx()`, + 2 more	Import SPSS, Stata, SAS, and Excel with tagged NA support
Data Export	`write_spss()`, `write_stata()`, `write_xpt()`, `write_xlsx()`	Export with full label and missing value roundtripping
Label Management	`var_label()`, `val_labels()`, `to_label()`, `set_na()`, + 6 more	Get/set labels, convert formats, declare missing values
Data Transformation	`rec()`, `to_dummy()`, `std()`, `center()`, `find_var()`	Recoding, dummy coding, standardization, centering
Descriptive	`describe()`, `frequency()`, `crosstab()`, `codebook()`	Summaries, distributions, and data dictionaries
T-Tests	`t_test()`	Mean comparisons (independent, paired, one-sample)
ANOVA	`oneway_anova()`, `factorial_anova()`, `ancova()`	One-way, multi-factor ANOVA, and ANCOVA with Type III SS
Non-parametric	`mann_whitney()`, `kruskal_wallis()`, `wilcoxon_test()`, `friedman_test()`, `binomial_test()`	Distribution-free tests
Exact tests	`chi_square()`, `fisher_test()`, `chisq_gof()`, `mcnemar_test()`	Categorical associations and exact tests
Correlation	`pearson_cor()`, `spearman_rho()`, `kendall_tau()`	Relationships between variables
Post-hoc	`tukey_test()`, `scheffe_test()`, `levene_test()`, `dunn_test()`, `pairwise_wilcoxon()`	Follow-up analyses (parametric and non-parametric)
Scale analysis	`reliability()`, `efa()`, `row_means()`, `row_sums()`, `row_count()`, `pomps()`	Cronbach’s Alpha, factor analysis, index construction
Regression	`linear_regression()`, `logistic_regression()`	Linear and logistic models with SPSS-style output
Effect sizes	`phi()`, `cramers_v()`, `goodman_gamma()`	Effect size measures for categorical data
Weighted stats	`w_mean()`, `w_median()`, `w_sd()`, + 8 more	Individual weighted statistics

Survey Weights Built-In

Every function handles survey weights correctly:

# Weighted mean, median, SD
survey_data %>%
  w_mean(age, income, weights = sampling_weight)

# Grouped weighted analysis
survey_data %>%
  group_by(region) %>%
  describe(satisfaction, weights = sampling_weight)

Tidyverse Integration

Full support for pipes and grouped operations:

survey_data %>%
  filter(age >= 18) %>%
  group_by(region) %>%
  t_test(life_satisfaction, group = gender, weights = sampling_weight)

Multi-Factor ANOVA & ANCOVA

# Factorial ANOVA with Type III SS
survey_data %>%
  factorial_anova(dv = income, between = c(gender, education),
                  weights = sampling_weight)

# ANCOVA with covariate adjustment
survey_data %>%
  ancova(dv = income, between = gender, covariate = age,
         weights = sampling_weight)

S3 Post-Hoc Methods

# Parametric: ANOVA → Tukey/Scheffe
result <- survey_data %>%
  oneway_anova(life_satisfaction, group = education, weights = sampling_weight)

result %>% tukey_test()    # Pairwise comparisons
result %>% levene_test()   # Variance homogeneity

# Non-parametric: Kruskal-Wallis → Dunn
kw_result <- survey_data %>%
  kruskal_wallis(life_satisfaction, group = education)

kw_result %>% dunn_test()  # Pairwise Dunn comparisons

Flexible Output: Compact & Detailed

Every analysis function provides two output levels. Typing the result name prints a compact one-line summary. Calling summary() produces the full SPSS-style output with all details. You can toggle individual sections on or off:

# Compact one-line summary (default)
survey_data %>%
  t_test(life_satisfaction, group = gender)

# Full detailed output
survey_data %>%
  t_test(life_satisfaction, group = gender) %>%
  summary()

# Toggle individual sections
survey_data %>%
  t_test(life_satisfaction, group = gender) %>%
  summary(effect_sizes = FALSE)

This works for all analysis functions — t_test(), oneway_anova(), chi_square(), pearson_cor(), reliability(), linear_regression(), and more.

SPSS Compatibility

Every function is validated against SPSS v29 across four scenarios: weighted/unweighted and grouped/ungrouped. If you’re migrating from SPSS, your results will match:

SPSS:

WEIGHT BY sampling_weight.
T-TEST GROUPS=gender(1 2)
  /VARIABLES=satisfaction.

mariposa:

survey_data %>%
  t_test(satisfaction, group = gender, weights = sampling_weight)

Documentation

Complete Reference - All functions with examples
Getting Started - Introduction and first steps
Scale Analysis - Reliability, factor analysis, and scale construction
Regression Analysis - Linear and logistic regression
Survey Weights Guide - Working with weighted data

Support

GitHub Issues - Bug reports and feature requests
GitHub Discussions - Questions and ideas

License

MIT - Yannick Diehl