top of page

Pearson’s Chi-Square Test

Pearson's Chi-Square Test, also known as the chi-square goodness-of-fit test, is a statistical test used to determine if there is a significant difference between observed frequencies and expected frequencies in a set of categorical data. The test is used to assess the relationship between two categorical variables by comparing the observed count of each category to the expected count.

The test statistic is calculated as the sum of the squared differences between the observed and expected frequencies, divided by the expected frequencies. This value is compared to a critical value from the chi-square distribution to determine the significance of the result.

The null hypothesis of Pearson's Chi-Square Test is that there is no relationship between the two categorical variables. The alternative hypothesis is that there is a relationship between the two variables. If the test statistic is greater than the critical value, the null hypothesis is rejected and it is concluded that there is a significant relationship between the two variables.

The Pearson's Chi-Square Test is widely used in fields such as sociology, psychology, and biology to test the independence of two categorical variables. For example, it can be used to determine if there is a relationship between gender and voting patterns, or between smoking and lung cancer.

It is important to note that the Pearson's Chi-Square Test requires that the sample size be large enough and the expected frequencies in each category be greater than 5. If the sample size is too small or the expected frequencies are too low, other tests such as Fisher's exact test or the Monte Carlo method may be used instead.

Here's an implementation of Pearson's Chi-Square Test in Python:

import numpy as np
from scipy.stats import chi2_contingency

def pearson_chi_square_test(observed_frequencies):
    stat, p_value, dof, expected_frequencies = chi2_contingency(observed_frequencies)
    return p_value

 

# Example usage
observed_frequencies = np.array([[30, 20], [40, 10]])
p_value = pearson_chi_square_test(observed_frequencies)

 

if p_value < 0.05:
    print("Reject the null hypothesis. There is a significant relationship between the variables.")
else:
    print("Fail to reject the null hypothesis. There is not a significant relationship between the variables.")

 

In this example, the observed_frequencies is a 2x2 array representing the frequencies in the contingency table, and the p_value is the result of the test. If the p_value is less than 0.05, we reject the null hypothesis and conclude that there is a significant relationship between the variables. Otherwise, we fail to reject the null hypothesis and conclude that there is not a significant relationship between the variables.

bottom of page