describe() gives you a complete summary of numeric variables - like age,
income, or satisfaction scores. It's your first step in any analysis, helping you
understand what's typical, what's unusual, and how spread out your data is.
Think of it as a health check for your data that reveals:
What's the average value?
What's the middle value?
How spread out are the responses?
Are there unusual patterns or outliers?
Usage
describe(
data,
...,
weights = NULL,
show = "short",
probs = c(0.25, 0.5, 0.75),
na.rm = TRUE,
excess = TRUE
)Arguments
- data
Your survey data (a data frame or tibble)
- ...
The numeric variables you want to summarize. List them separated by commas, or use helpers like
starts_with("trust")- weights
Optional survey weights for population-representative results. Without weights, you describe your sample. With weights, you describe the population.
- show
Which statistics to display:
"short"(default): Essential stats (mean, median, SD, range, IQR, skewness)"all": Everything including variance, kurtosis, mode, quantilesCustom list: Choose specific stats like
c("mean", "sd", "range")
- probs
For quantiles, which percentiles to show (default: 25th, 50th, 75th)
- na.rm
Remove missing values before calculating? (Default: TRUE)
- excess
For kurtosis, show excess kurtosis? (Default: TRUE, easier to interpret)
Details
Understanding the Results
Key statistics and what they tell you:
n: How many valid responses (watch for too many missing)
Mean: The average value
Median: The middle value (half above, half below)
SD: Standard deviation - how spread out values are
Range: The minimum and maximum values
IQR: Interquartile range - the middle 50% of values
Skewness: Whether data leans left (negative) or right (positive)
Kurtosis: Whether you have unusual outliers
See also
summary for base R summary statistics.
frequency for categorical variable summaries.
w_mean, w_sd, w_median for
individual weighted statistics.
t_test and oneway_anova for group comparisons.
Other descriptive:
crosstab(),
frequency()
Examples
# Load required packages and data
library(dplyr)
data(survey_data)
# Basic unweighted analysis
survey_data %>% describe(age)
#>
#> Descriptive Statistics
#> ----------------------
#> Variable Mean Median SD Range IQR Skewness N Missing
#> age 50.55 50 16.976 77 24 0.172 2500 0
#> ----------------------------------------
# Weighted analysis
survey_data %>% describe(age, weights = sampling_weight)
#>
#> Weighted Descriptive Statistics
#> -------------------------------
#> Variable Mean Median SD Range IQR Skewness Effective_N
#> age 50.514 50 17.084 77 25 0.159 2468.8
#> ----------------------------------------
# Multiple variables with custom statistics
survey_data %>% describe(age, income, life_satisfaction,
weights = sampling_weight,
show = c("mean", "sd", "skew"))
#>
#> Weighted Descriptive Statistics
#> -------------------------------
#> Variable Mean SD Skewness Effective_N
#> age 50.514 17.084 0.159 2468.8
#> income 3743.099 1423.966 0.724 2158.9
#> life_satisfaction 3.625 1.152 -0.498 2390.9
#> ----------------------------------------
# Grouped analysis
survey_data %>%
group_by(region) %>%
describe(age, weights = sampling_weight)
#>
#> Weighted Descriptive Statistics
#> -------------------------------
#>
#> Group: region = East
#> --------------------
#> ----------------------------------------
#> Variable Mean Median SD Range IQR Skewness Effective_N
#> age 52.278 53 17.595 77 24 0.098 477
#> ----------------------------------------
#>
#> Group: region = West
#> --------------------
#> ----------------------------------------
#> Variable Mean Median SD Range IQR Skewness Effective_N
#> age 50.067 49 16.927 77 24 0.17 1993.1
#> ----------------------------------------
