Chi-Square Goodness-of-Fit Test

chisq_gof() tests whether observed frequencies of a categorical variable match expected frequencies. By default, it tests against equal proportions (uniform distribution). You can also specify custom expected proportions.

Think of it as:

Testing whether categories are equally distributed
Comparing observed distribution to a theoretical distribution
The one-sample version of the chi-square test

The test tells you:

Whether observed frequencies differ from expected frequencies
How strong the deviation is (chi-square statistic)
A frequency table with observed, expected, and residual counts

Usage

chisq_gof(data, ..., expected = NULL, weights = NULL)

Arguments

data: Your survey data (data frame or tibble)
...: One or more categorical variables to test (tidyselect supported)
expected: Optional numeric vector of expected proportions (must sum to 1). Only used when a single variable is tested. If NULL (default), equal proportions are assumed.
weights: Optional survey weights for population-representative results

Value

Test results showing whether observed frequencies match expected, including:

Chi-square statistic (chi_squared) and p-value for each variable
Degrees of freedom
Frequency table with observed, expected, and residual counts
Sample size (N)

Details

Understanding the Results

P-value: If p < 0.05, the distribution differs from expected

p < 0.001: Very strong evidence the distribution differs
p < 0.01: Strong evidence the distribution differs
p < 0.05: Moderate evidence the distribution differs
p >= 0.05: No significant deviation from expected distribution

Residuals: The difference between observed and expected counts. Large positive residuals indicate a category has more cases than expected; large negative residuals indicate fewer cases than expected.

When to Use This

Use this test when:

You want to check whether a categorical variable follows a specific distribution
You want to test if categories are equally distributed (uniform)
You have a single categorical variable and a hypothesised distribution

The Chi-Square Goodness-of-Fit Statistic

$$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$$

where O_i = observed frequency, E_i = expected frequency.

Degrees of freedom = number of categories - 1.

Relationship to Other Tests

For testing association between two categorical variables: Use chi_square() instead
For testing a single binary proportion: Use binomial_test() instead
For small samples where expected frequencies are below 5: Use fisher_test() instead

SPSS Equivalent

SPSS: NPAR TESTS /CHISQUARE=variable /EXPECTED=EQUAL or: NPAR TESTS /CHISQUARE=variable /EXPECTED=50 30 20

References

Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50(302), 157-175.

Examples

# Load required packages and data
library(dplyr)
data(survey_data)

# Test whether gender is equally distributed
survey_data %>%
  chisq_gof(gender)
#> Chi-Square Goodness-of-Fit Test: gender
#>   chi2(1) = 5.018, p = 0.025 *, N = 2500
#> Use summary() for detailed output.

# Test multiple variables at once
survey_data %>%
  chisq_gof(gender, region, education)
#> Chi-Square Goodness-of-Fit Test: gender
#>   chi2(1) = 5.018, p = 0.025 *, N = 2500
#> Chi-Square Goodness-of-Fit Test: region
#>   chi2(1) = 936.360, p < 0.001 ***, N = 2500
#> Chi-Square Goodness-of-Fit Test: education
#>   chi2(3) = 156.454, p < 0.001 ***, N = 2500
#> Use summary() for detailed output.

# Custom expected proportions
survey_data %>%
  chisq_gof(interview_mode, expected = c(0.5, 0.3, 0.2))
#> Chi-Square Goodness-of-Fit Test: interview_mode
#>   chi2(2) = 95.413, p < 0.001 ***, N = 2500
#> Use summary() for detailed output.

# With weights
survey_data %>%
  chisq_gof(gender, weights = sampling_weight)
#> Chi-Square Goodness-of-Fit Test: gender [Weighted]
#>   chi2(1) = 6.310, p = 0.012 *, N = 2516
#> Use summary() for detailed output.

# Grouped analysis
survey_data %>%
  group_by(region) %>%
  chisq_gof(education)
#> [region = 1]
#> Chi-Square Goodness-of-Fit Test: education
#>   chi2(3) = 34.645, p < 0.001 ***, N = 485
#> [region = 2]
#> Chi-Square Goodness-of-Fit Test: education
#>   chi2(3) = 122.888, p < 0.001 ***, N = 2015
#> Use summary() for detailed output.