Read SPSS Data with Tagged Missing Values

Reads an SPSS .sav file and preserves user-defined missing values as tagged NAs instead of converting them to regular NA. This allows you to distinguish between different types of missing data (e.g., "no answer", "not applicable", "refused") while still treating them as NA in standard R operations.

Usage

read_spss(path, tag_na = TRUE, encoding = NULL, verbose = FALSE)

Arguments

path: Path to an SPSS .sav file.
tag_na: If TRUE (the default), user-defined missing values are converted to tagged NAs using haven::tagged_na(). If FALSE, the file is read with standard haven::read_sav() behavior (all missing values become regular NA).
encoding: Character encoding for the file. If NULL, haven's default encoding detection is used.
verbose: If TRUE, prints a message summarizing how many values were converted.

Value

A tibble with the SPSS data. When tag_na = TRUE:

User-defined missing values are stored as tagged NAs
is.na() returns TRUE for these values (standard R behavior)
The original SPSS missing codes can be recovered via na_frequencies(), untag_na(), or haven::na_tag()
Each tagged variable has an "na_tag_map" attribute mapping tag characters to original SPSS codes

Details

SPSS allows defining specific values as "user-defined missing values" (e.g., -9 = "no answer", -8 = "don't know"). When reading .sav files with haven::read_sav(), these are silently converted to NA, losing the information about why a value is missing.

read_spss() preserves this information using haven's tagged NA system: each missing value type gets a unique tag character (a-z, A-Z, 0-9) that can be inspected with haven::na_tag(). The values still behave as NA in all standard R operations (mean(), sum(), is.na(), etc.).

Use the companion functions to work with the tagged NAs:

na_frequencies() - Frequency table of missing types
untag_na() - Convert tagged NAs back to original SPSS codes
strip_tags() - Convert tagged NAs to regular NAs (drop tags)

Examples

if (FALSE) { # \dontrun{
# Read SPSS file with tagged missing values
data <- read_spss("survey.sav")

# Check what types of missing values exist
na_frequencies(data$satisfaction)

# Standard R operations work normally (NAs are excluded)
mean(data$satisfaction, na.rm = TRUE)

# frequency() shows each missing type separately
data %>% frequency(satisfaction)

# Recover original SPSS codes
original_codes <- untag_na(data$satisfaction)

# Convert to regular NAs (standard behavior)
data_clean <- strip_tags(data$satisfaction)
} # }