Skip to contents

Replaces specific numeric values with NA (or tagged NAs) across one or more variables. This is essential for data cleaning workflows where missing value codes (e.g., -9, -8, 99) are stored as regular values and need to be declared as missing after import.

Usage

set_na(data, ..., tag = TRUE, verbose = FALSE)

Arguments

data

A data frame, tibble, or a single vector.

...

Values to set as missing. Can be:

  • Unnamed numeric values: Applied to all numeric columns (e.g., set_na(data, -9, -8))

  • Named pairs: Applied to specific variables (e.g., set_na(data, income = c(-9, -8), age = -1))

tag

If TRUE (default), uses tagged NAs to preserve distinct missing types. The resulting tagged NAs integrate with na_frequencies(), frequency(), and codebook(). If FALSE, replaces with regular NA.

verbose

If TRUE, prints a summary of conversions.

Value

The modified data (invisibly for data frames, visibly for vectors).

Details

Tagged vs. Regular NA

When tag = TRUE (default), each missing value code gets a unique tag character, so you can distinguish between "No answer" (-9) and "Not applicable" (-8) in downstream analysis. This is the same system used by read_spss() with tag.na = TRUE.

When tag = FALSE, all specified values become regular NA and the distinction between different missing types is lost.

Interaction with Existing Labels

If a value being set to missing has an existing value label, that label is preserved as a tagged NA label (when tag = TRUE), making it visible in frequency() and codebook() output.

See also

na_frequencies() for inspecting missing types, strip_tags() for converting tagged NAs to regular NA, untag_na() for recovering original codes

Other labels: copy_labels(), drop_labels(), find_var(), to_character(), to_label(), to_labelled(), to_numeric(), unlabel(), val_labels(), var_label()

Examples

if (FALSE) { # \dontrun{
# Set -9 and -8 as missing across all numeric variables
data <- set_na(survey_data, -9, -8)

# Set missing for specific variables only
data <- set_na(survey_data,
  income = c(-9, -8, -42),
  life_satisfaction = c(-9, -11)
)

# Use regular NA instead of tagged NA
data <- set_na(survey_data, -9, -8, tag = FALSE)

# Check the result
na_frequencies(data$income)
} # }