pearson_cor() shows you how strongly numeric variables are related to each
other. For example, is age related to income? Does satisfaction increase with
experience? This helps you understand patterns in your data.
The correlation tells you:
Direction: Positive (both increase together) or negative (one increases as other decreases)
Strength: How closely the variables move together (from 0 = no relationship to 1 = perfect relationship)
Significance: Whether the relationship is real or could be due to chance
Arguments
- data
Your survey data (a data frame or tibble)
- ...
The numeric variables you want to correlate. List two for a single correlation or more for a correlation matrix.
- weights
Optional survey weights for population-representative results
- conf.level
Confidence level for intervals (Default: 0.95 = 95%)
- alternative
Direction of the test:
"two.sided"(default),"less", or"greater".- use
How to handle missing values:
"pairwise"(default): Use all available data for each pair"listwise": Only use complete cases across all variables
- na.rm
Deprecated. Use
useinstead.
Value
Correlation results showing relationships between variables, including:
Correlation coefficient (r): Strength and direction of relationship
P-value: Whether the relationship is statistically significant
Confidence interval: Range of plausible correlation values
Sample size: Number of observations used Use
summary()for the full SPSS-style output with toggleable sections.
Details
Understanding the Results
Correlation coefficient (r) ranges from -1 to +1:
+1: Perfect positive relationship (as one goes up, the other always goes up)
0: No linear relationship
-1: Perfect negative relationship (as one goes up, the other always goes down)
Interpreting strength (absolute value of r):
0.00 - 0.10: Negligible relationship
0.10 - 0.30: Weak relationship
0.30 - 0.50: Moderate relationship
0.50 - 0.70: Strong relationship
0.70 - 0.90: Very strong relationship
0.90 - 1.00: Extremely strong relationship
P-value interpretation:
p < 0.001: Very strong evidence of a relationship
p < 0.01: Strong evidence of a relationship
p < 0.05: Moderate evidence of a relationship
p >= 0.05: No significant relationship found
A correlation of 0.65 with p < 0.001 means:
Strong positive relationship (r = 0.65)
As one variable increases, the other tends to increase
Very unlikely to be due to chance (p < 0.001)
About 42% of variation is shared (r-squared = 0.65 squared = 0.42)
When to Use This
Use Pearson correlation when:
Both variables are numeric and continuous
You expect a linear relationship
Data is roughly normally distributed
You want to measure strength of linear association
Don't use when:
Data has extreme outliers (consider Spearman instead)
Relationship is curved/non-linear
Variables are categorical (use chi-squared test)
You need to establish causation (correlation does not imply causation)
Tips for Success
Always plot your data first to check for non-linear patterns
Consider both statistical significance (p-value) and practical importance (r value)
Remember: correlation does not imply causation
Check for outliers that might inflate or deflate correlations
Use Spearman correlation for ordinal data or non-normal distributions
References
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.
Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika, 10(4), 507–521.
See also
cor for the base R correlation function.
cor.test for correlation significance testing.
spearman_rho for rank-based correlation (robust to outliers).
kendall_tau for ordinal correlation.
summary.pearson_cor for detailed output with toggleable sections.
Other correlation:
kendall_tau(),
spearman_rho()
Examples
# Load required packages and data
library(dplyr)
data(survey_data)
# Basic correlation between two variables
survey_data %>%
pearson_cor(age, income)
#> Pearson Correlation: age x income
#> r = -0.007, p = 0.761 , N = 2186
# Correlation matrix for multiple variables
survey_data %>%
pearson_cor(age, income, life_satisfaction)
#> Pearson Correlation: 3 variables
#> age x income: r = -0.007, p = 0.761
#> age x life_satisfaction: r = -0.029, p = 0.158
#> income x life_satisfaction: r = 0.448, p < 0.001 ***
#> 1/3 pairs significant (p < .05), N = 2186
# Weighted correlations
survey_data %>%
pearson_cor(age, income, weights = sampling_weight)
#> Pearson Correlation: age x income [Weighted]
#> r = -0.005, p = 0.828 , N = 2201
# Grouped correlations
survey_data %>%
group_by(region) %>%
pearson_cor(age, income, life_satisfaction)
#> [region = 1]
#> Pearson Correlation: 3 variables
#> age x income: r = 0.039, p = 0.415
#> age x life_satisfaction: r = -0.043, p = 0.350
#> income x life_satisfaction: r = 0.448, p < 0.001 ***
#> 1/3 pairs significant (p < .05), N = 429
#> [region = 2]
#> Pearson Correlation: 3 variables
#> age x income: r = -0.018, p = 0.462
#> age x life_satisfaction: r = -0.025, p = 0.274
#> income x life_satisfaction: r = 0.449, p < 0.001 ***
#> 1/3 pairs significant (p < .05), N = 1757
# Using tidyselect helpers
survey_data %>%
pearson_cor(where(is.numeric), weights = sampling_weight)
#> Pearson Correlation: 10 variables [Weighted]
#> id x age: r = 0.009, p = 0.648
#> id x income: r = 0.027, p = 0.206
#> id x political_orientation: r = -0.023, p = 0.271
#> id x environmental_concern: r = -0.006, p = 0.763
#> id x life_satisfaction: r = 0.011, p = 0.587
#> id x trust_government: r = 0.021, p = 0.310
#> id x trust_media: r = -0.008, p = 0.714
#> id x trust_science: r = 0.016, p = 0.439
#> id x sampling_weight: r = -0.016, p = 0.436
#> age x income: r = -0.005, p = 0.828
#> age x political_orientation: r = -0.029, p = 0.168
#> age x environmental_concern: r = 0.024, p = 0.244
#> age x life_satisfaction: r = -0.029, p = 0.150
#> age x trust_government: r = 0.005, p = 0.804
#> age x trust_media: r = 0.005, p = 0.820
#> age x trust_science: r = 0.003, p = 0.902
#> age x sampling_weight: r = -0.009, p = 0.645
#> income x political_orientation: r = -0.034, p = 0.125
#> income x environmental_concern: r = 0.015, p = 0.503
#> income x life_satisfaction: r = 0.450, p < 0.001 ***
#> income x trust_government: r = -0.001, p = 0.975
#> income x trust_media: r = -0.011, p = 0.629
#> income x trust_science: r = -0.024, p = 0.270
#> income x sampling_weight: r = -0.040, p = 0.058
#> political_orientation x environmental_concern: r = -0.584, p < 0.001 ***
#> political_orientation x life_satisfaction: r = -0.004, p = 0.836
#> political_orientation x trust_government: r = -0.057, p = 0.008 **
#> political_orientation x trust_media: r = 0.004, p = 0.835
#> political_orientation x trust_science: r = 0.040, p = 0.059
#> political_orientation x sampling_weight: r = 0.011, p = 0.590
#> environmental_concern x life_satisfaction: r = -0.003, p = 0.866
#> environmental_concern x trust_government: r = 0.064, p = 0.002 **
#> environmental_concern x trust_media: r = 0.002, p = 0.907
#> environmental_concern x trust_science: r = -0.014, p = 0.507
#> environmental_concern x sampling_weight: r = 0.020, p = 0.328
#> life_satisfaction x trust_government: r = 0.011, p = 0.604
#> life_satisfaction x trust_media: r = 0.020, p = 0.330
#> life_satisfaction x trust_science: r = -0.019, p = 0.371
#> life_satisfaction x sampling_weight: r = -0.019, p = 0.359
#> trust_government x trust_media: r = 0.012, p = 0.582
#> trust_government x trust_science: r = 0.031, p = 0.145
#> trust_government x sampling_weight: r = -0.009, p = 0.679
#> trust_media x trust_science: r = 0.024, p = 0.259
#> trust_media x sampling_weight: r = 0.022, p = 0.281
#> trust_science x sampling_weight: r = 0.001, p = 0.961
#> 4/45 pairs significant (p < .05), N = 2516
# Listwise deletion for missing data
survey_data %>%
pearson_cor(age, income, use = "listwise")
#> Pearson Correlation: age x income
#> r = -0.007, p = 0.761 , N = 2186
# --- Three-layer output ---
result <- survey_data %>%
pearson_cor(age, income, life_satisfaction, weights = sampling_weight)
result # compact one-line overview
#> Pearson Correlation: 3 variables [Weighted]
#> age x income: r = -0.005, p = 0.828
#> age x life_satisfaction: r = -0.029, p = 0.150
#> income x life_satisfaction: r = 0.450, p < 0.001 ***
#> 1/3 pairs significant (p < .05), N = 2201
summary(result) # full correlation, p-value, and N matrices
#>
#> Weighted Pearson Correlation
#> -----------------------------
#>
#> - Weights variable: sampling_weight
#> - Missing data handling: pairwise deletion
#> - Confidence level: 95.0%
#>
#>
#> Correlation Matrix:
#> -------------------
#> age income life_satisfaction
#> age 1.000 -0.005 -0.029
#> income -0.005 1.000 0.450
#> life_satisfaction -0.029 0.450 1.000
#> -------------------
#>
#> Significance Matrix (p-values):
#> -------------------------------
#> age income life_satisfaction
#> age 0.0000 0.8276 0.1496
#> income 0.8276 0.0000 0.0000
#> life_satisfaction 0.1496 0.0000 0.0000
#> -------------------------------
#>
#> Sample Size Matrix:
#> -------------------
#> age income life_satisfaction
#> age 2516 2201 2437
#> income 2201 2201 2130
#> life_satisfaction 2437 2130 2437
#> -------------------
#>
#> Pairwise Results:
#> ----------------
#> Variable_Pair r r_squared p_value CI_95 n sig
#> age × income -0.005 0.000 0.8276 [-0.046, 0.037] 2201
#> age × life_satisfaction -0.029 0.001 0.1496 [-0.069, 0.011] 2437
#> income × life_satisfaction 0.450 0.203 0.0000 [0.416, 0.483] 2130 ***
#> ----------------
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
summary(result, pvalue_matrix = FALSE) # hide p-values
#>
#> Weighted Pearson Correlation
#> -----------------------------
#>
#> - Weights variable: sampling_weight
#> - Missing data handling: pairwise deletion
#> - Confidence level: 95.0%
#>
#>
#> Correlation Matrix:
#> -------------------
#> age income life_satisfaction
#> age 1.000 -0.005 -0.029
#> income -0.005 1.000 0.450
#> life_satisfaction -0.029 0.450 1.000
#> -------------------
#>
#> Sample Size Matrix:
#> -------------------
#> age income life_satisfaction
#> age 2516 2201 2437
#> income 2201 2201 2130
#> life_satisfaction 2437 2130 2437
#> -------------------
#>
#> Pairwise Results:
#> ----------------
#> Variable_Pair r r_squared p_value CI_95 n sig
#> age × income -0.005 0.000 0.8276 [-0.046, 0.037] 2201
#> age × life_satisfaction -0.029 0.001 0.1496 [-0.069, 0.011] 2437
#> income × life_satisfaction 0.450 0.203 0.0000 [0.416, 0.483] 2130 ***
#> ----------------
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
