Professional statistical analysis for survey data in R.
mariposa (Marburg Initiative for Political and Social Analysis) provides 76 functions for importing, managing, transforming, and analyzing survey data. Covers the full workflow from data import (SPSS, Stata, SAS, Excel) through label management, recoding, and standardization to statistical analysis with survey weights, grouped operations via dplyr::group_by(), and publication-ready output. All statistical results are validated against SPSS v29 for full reproducibility.
Quick Start
library(mariposa)
library(dplyr)
# Load example survey data (2,500 respondents)
data(survey_data)
# Interactive HTML codebook in RStudio Viewer
codebook(survey_data)
# Descriptive statistics with survey weights
survey_data %>%
describe(age, income, life_satisfaction, weights = sampling_weight)
# Frequency table
survey_data %>%
frequency(education, weights = sampling_weight)
# Compare groups with t-test
survey_data %>%
t_test(life_satisfaction, group = gender, weights = sampling_weight)
# Detailed SPSS-style output with summary()
survey_data %>%
t_test(life_satisfaction, group = gender, weights = sampling_weight) %>%
summary()
# Scale analysis workflow
reliability(survey_data, trust_government, trust_media, trust_science) %>%
summary() # item statistics, inter-item correlations
survey_data <- survey_data %>%
mutate(m_trust = row_means(., trust_government, trust_media, trust_science))
# Regression
survey_data %>%
linear_regression(life_satisfaction ~ age + income, weights = sampling_weight) %>%
summary() # coefficients, ANOVA table, diagnosticsCore Features
Statistical Functions
| Category | Functions | Purpose |
|---|---|---|
| Data Import |
read_spss(), read_stata(), read_sas(), read_xlsx(), + 2 more |
Import SPSS, Stata, SAS, and Excel with tagged NA support |
| Data Export |
write_spss(), write_stata(), write_xpt(), write_xlsx()
|
Export with full label and missing value roundtripping |
| Label Management |
var_label(), val_labels(), to_label(), set_na(), + 6 more |
Get/set labels, convert formats, declare missing values |
| Data Transformation |
rec(), to_dummy(), std(), center(), find_var()
|
Recoding, dummy coding, standardization, centering |
| Descriptive |
describe(), frequency(), crosstab(), codebook()
|
Summaries, distributions, and data dictionaries |
| T-Tests | t_test() |
Mean comparisons (independent, paired, one-sample) |
| ANOVA |
oneway_anova(), factorial_anova(), ancova()
|
One-way, multi-factor ANOVA, and ANCOVA with Type III SS |
| Non-parametric |
mann_whitney(), kruskal_wallis(), wilcoxon_test(), friedman_test(), binomial_test()
|
Distribution-free tests |
| Exact tests |
chi_square(), fisher_test(), chisq_gof(), mcnemar_test()
|
Categorical associations and exact tests |
| Correlation |
pearson_cor(), spearman_rho(), kendall_tau()
|
Relationships between variables |
| Post-hoc |
tukey_test(), scheffe_test(), levene_test(), dunn_test(), pairwise_wilcoxon()
|
Follow-up analyses (parametric and non-parametric) |
| Scale analysis |
reliability(), efa(), row_means(), row_sums(), row_count(), pomps()
|
Cronbach’s Alpha, factor analysis, index construction |
| Regression |
linear_regression(), logistic_regression()
|
Linear and logistic models with SPSS-style output |
| Effect sizes |
phi(), cramers_v(), goodman_gamma()
|
Effect size measures for categorical data |
| Weighted stats |
w_mean(), w_median(), w_sd(), + 8 more |
Individual weighted statistics |
Multi-Factor ANOVA & ANCOVA
# Factorial ANOVA with Type III SS
survey_data %>%
factorial_anova(dv = income, between = c(gender, education),
weights = sampling_weight)
# ANCOVA with covariate adjustment
survey_data %>%
ancova(dv = income, between = gender, covariate = age,
weights = sampling_weight)S3 Post-Hoc Methods
# Parametric: ANOVA → Tukey/Scheffe
result <- survey_data %>%
oneway_anova(life_satisfaction, group = education, weights = sampling_weight)
result %>% tukey_test() # Pairwise comparisons
result %>% levene_test() # Variance homogeneity
# Non-parametric: Kruskal-Wallis → Dunn
kw_result <- survey_data %>%
kruskal_wallis(life_satisfaction, group = education)
kw_result %>% dunn_test() # Pairwise Dunn comparisonsFlexible Output: Compact & Detailed
Every analysis function provides two output levels. Typing the result name prints a compact one-line summary. Calling summary() produces the full SPSS-style output with all details. You can toggle individual sections on or off:
# Compact one-line summary (default)
survey_data %>%
t_test(life_satisfaction, group = gender)
# Full detailed output
survey_data %>%
t_test(life_satisfaction, group = gender) %>%
summary()
# Toggle individual sections
survey_data %>%
t_test(life_satisfaction, group = gender) %>%
summary(effect_sizes = FALSE)This works for all analysis functions — t_test(), oneway_anova(), chi_square(), pearson_cor(), reliability(), linear_regression(), and more.
SPSS Compatibility
Every function is validated against SPSS v29 across four scenarios: weighted/unweighted and grouped/ungrouped. If you’re migrating from SPSS, your results will match:
SPSS:
WEIGHT BY sampling_weight.
T-TEST GROUPS=gender(1 2)
/VARIABLES=satisfaction.
mariposa:
Documentation
- Complete Reference - All functions with examples
- Getting Started - Introduction and first steps
- Scale Analysis - Reliability, factor analysis, and scale construction
- Regression Analysis - Linear and logistic regression
- Survey Weights Guide - Working with weighted data
Support
- GitHub Issues - Bug reports and feature requests
- GitHub Discussions - Questions and ideas
