chi_square() helps you discover if two categorical variables are
related or independent. For example, is education level related to voting
preference? Or are they independent of each other?
The test tells you:
Whether the relationship is statistically significant
How strong the relationship is (effect sizes)
What patterns exist in your data
Usage
chi_square(data, ..., weights = NULL, correct = FALSE)
phi(data, ..., weights = NULL, correct = FALSE)
cramers_v(data, ..., weights = NULL, correct = FALSE)
goodman_gamma(data, ..., weights = NULL, correct = FALSE)Value
Test results showing whether the variables are related, including:
Chi-squared statistic and p-value
Observed vs expected frequencies
Effect sizes to measure relationship strength Use
summary()for the full SPSS-style output with toggleable sections.
Details
Understanding the Results
P-value: If p < 0.05, the variables are likely related (not independent)
p < 0.001: Very strong evidence of relationship
p < 0.01: Strong evidence of relationship
p < 0.05: Moderate evidence of relationship
p ≥ 0.05: No significant relationship found
Effect Sizes (How strong is the relationship?):
Cramer's V: Works for any table size (0 = no relationship, 1 = perfect relationship)
< 0.1: Negligible relationship
0.1-0.3: Small relationship
0.3-0.5: Medium relationship
0.5 or higher: Large relationship
Phi: Only for 2x2 tables (similar interpretation as Cramer's V)
Gamma: For ordinal data (-1 to +1, shows direction of relationship)
When to Use This
Use chi-squared test when:
Both variables are categorical (gender, region, education level, etc.)
You want to know if they're related or independent
You have at least 5 observations in most cells
References
Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50(302), 157–175.
Cramer, H. (1946). Mathematical Methods of Statistics. Princeton University Press.
IBM Corp. (2023). IBM SPSS Statistics 29 Algorithms. IBM Corporation.
See also
chisq.test for the base R chi-squared test.
crosstab for detailed cross-tabulation tables.
frequency for single-variable frequency tables.
summary.chi_square for detailed output with toggleable sections.
Other hypothesis_tests:
ancova(),
binomial_test(),
chisq_gof(),
factorial_anova(),
fisher_test(),
friedman_test(),
kruskal_wallis(),
mann_whitney(),
mcnemar_test(),
oneway_anova(),
t_test(),
wilcoxon_test()
Examples
# Load required packages and data
library(dplyr)
data(survey_data)
# Basic chi-squared test for independence
survey_data %>% chi_square(gender, region)
#> Chi-Squared Test: gender × region
#> chi2(1) = 0.415, p = 0.519 , V = 0.013 (neglig.), N = 2500
# With weights
survey_data %>% chi_square(gender, education, weights = sampling_weight)
#> Chi-Squared Test: gender × education [Weighted]
#> chi2(3) = 4.403, p = 0.221 , V = 0.042 (neglig.), N = 2517
# Grouped analysis
survey_data %>%
group_by(region) %>%
chi_square(gender, employment)
#> [region = 1]
#> Chi-Squared Test: gender × employment
#> chi2(4) = 5.970, p = 0.201 , V = 0.111 (small), N = 485
#> [region = 2]
#> Chi-Squared Test: gender × employment
#> chi2(4) = 4.166, p = 0.384 , V = 0.045 (neglig.), N = 2015
# With continuity correction
survey_data %>% chi_square(gender, region, correct = TRUE)
#> Chi-Squared Test: gender × region
#> chi2(1) = 0.353, p = 0.553 , V = 0.012 (neglig.), N = 2500
# --- Three-layer output ---
result <- chi_square(survey_data, gender, education)
result # compact one-line overview
#> Chi-Squared Test: gender × education
#> chi2(3) = 3.470, p = 0.325 , V = 0.037 (neglig.), N = 2500
summary(result) # full detailed output with all sections
#>
#> Chi-Squared Test of Independence
#> ---------------------------------
#>
#> - Variables: gender × education
#>
#> Observed Frequencies:
#> education
#> gender Basic Secondary Intermediate Seco... Academic Secondary University
#> Male 401 289 320 184
#> Female 440 340 311 215
#>
#> Expected Frequencies:
#> education
#> gender Basic Secondary Intermediate Seco... Academic Secondary University
#> Male 401.662 300.41 301.366 190.562
#> Female 439.338 328.59 329.634 208.438
#>
#> Chi-Squared Test Results:
#> --------------------------------------------------
#> Chi_squared df p_value sig
#> 3.47 3 0.325
#> --------------------------------------------------
#>
#> Effect Sizes:
#> ----------------------------------------------------------------------
#> Measure Value p_value sig Interpretation
#> Cramer's V 0.037 0.325 Neglig.
#> Gamma -0.008 0.850 Weak
#> ----------------------------------------------------------------------
#> Table size: 2×4 | N = 2500
#> Note: Phi coefficient only shown for 2x2 tables
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
summary(result, cross_tabulation = FALSE) # hide cross-tabulation
#>
#> Chi-Squared Test of Independence
#> ---------------------------------
#>
#> - Variables: gender × education
#>
#> Observed Frequencies:
#> education
#> gender Basic Secondary Intermediate Seco... Academic Secondary University
#> Male 401 289 320 184
#> Female 440 340 311 215
#>
#> Expected Frequencies:
#> education
#> gender Basic Secondary Intermediate Seco... Academic Secondary University
#> Male 401.662 300.41 301.366 190.562
#> Female 439.338 328.59 329.634 208.438
#>
#> Chi-Squared Test Results:
#> --------------------------------------------------
#> Chi_squared df p_value sig
#> 3.47 3 0.325
#> --------------------------------------------------
#>
#> Effect Sizes:
#> ----------------------------------------------------------------------
#> Measure Value p_value sig Interpretation
#> Cramer's V 0.037 0.325 Neglig.
#> Gamma -0.008 0.850 Weak
#> ----------------------------------------------------------------------
#> Table size: 2×4 | N = 2500
#> Note: Phi coefficient only shown for 2x2 tables
#>
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
