Social Survey Data (Synthetic)

A synthetic dataset modeled after large-scale social surveys with demographics, attitudes, and survey design features. Created for testing and demonstration purposes without licensing concerns.

Usage

survey_data

Format

A data frame with 2,500 rows and 16 variables:

id: Unique identifier (1-2500)
age: Age in years (18-95)
gender: Gender (Male, Female)
region: Region (East, West)
education: Education level (Basic Secondary, Intermediate Secondary, Academic Secondary, University)
income: Monthly household income in EUR (800-15000)
employment: Employment status (Student, Employed, Unemployed, Retired, Other)
political_orientation: Political orientation (1=very left to 5=very right)
environmental_concern: Environmental concern (1=not concerned to 5=very concerned)
life_satisfaction: Life satisfaction (1=very dissatisfied to 5=very satisfied)
trust_government: Trust in government (1=no trust to 5=complete trust)
trust_media: Trust in media (1=no trust to 5=complete trust)
trust_science: Trust in science (1=no trust to 5=complete trust)
sampling_weight: Post-stratification sampling weight (0.7-1.4)
stratum: Stratification variable combining region and age group
interview_mode: Interview mode (Face-to-face, Telephone, Online)

Source

Generated synthetically using realistic demographic and attitudinal patterns. No real survey data was used.

Details

This dataset contains realistic patterns of correlation between variables:

Education correlates with income and political attitudes
Regional differences reflect East/West patterns
Age effects on employment and attitudes
Realistic missing data patterns (3-12% depending on sensitivity)
Survey weights for post-stratification adjustment

The data includes proper variable labels and follows social survey conventions for coding and structure. Generated with set.seed(2024) for reproducibility.

Examples

# Load required packages and data
library(dplyr)
data(survey_data)

# Basic descriptive statistics
survey_data %>% describe(age, income, weights = sampling_weight)
#> 
#> Weighted Descriptive Statistics
#> -------------------------------
#>  Variable     Mean Median       SD Range  IQR Skewness Effective_N
#>       age   50.514     50   17.084    77   25    0.159      2468.8
#>    income 3743.099   3500 1423.966  7200 1900    0.725      2158.9
#> ----------------------------------------

# Frequency analysis
survey_data %>% frequency(education, region, weights = sampling_weight)
#> Frequency: education [Weighted]
#>   4 categories, N valid = 2516, missing = 0
#> Frequency: region [Weighted]
#>   2 categories, N valid = 2516, missing = 0
#> Use summary() for detailed output.

# Group comparisons
survey_data %>%
  group_by(region) %>%
  t_test(life_satisfaction, group = gender, weights = sampling_weight)
#> [region = 1]
#> t-Test: life_satisfaction by gender [Weighted]
#>   t(484.7) = 0.641, p = 0.522 , g = 0.058 (negligible), N = 488
#> [region = 2]
#> t-Test: life_satisfaction by gender [Weighted]
#>   t(1901.1) = -1.548, p = 0.122 , g = -0.070 (negligible), N = 1949

Usage

Format

Source

Details

See also

Examples