StatEL : Compare distributions of a qualitative variable between 2 samples

1 - Principles of comparaison test for distribution of a qualitative variable :

We have 2 samples a and b whose proportions of different modalities of a qualitative variable are estimators of proportions of different modalitiesfor the same qualitative variable in the 2 populations A and B from where are extracted the 2 analyzed samples. The question is : if proportions are different between the 2 samples (that is easy to check), is it the same for the 2 populations ?

Another way to question is to ask if the 2 samples are extracted from only one population or from 2 different populations.

Null hypothesis : "H0 = pa and pb (proportions of studied parameter in both populations) are the same"
Alternative hypothesis : "H1 = pa and pb are different"

Principle of this test is based sur on calculation of Khi² (or Chi²), which consists in comparaison of observed effectives from the contingency table, with theoretical effectives, calculated under hypothesis of equality of proportions between the 2 populations.

Each theoretical effective is calculed by multiplying sum of the row with sum of the column and divided by total effective of the contingency table.

Khi² is calculated as below, only if each theoretical effective is at least equal to 5 :

statel khi2 comparison distribution qualitative variable formula excel

If contingency table contains only 4 cases (2 x 2) and if theoretical effective are lower than 5 (but at least equal to 3), we may apply Yates correction :

statel khi2 comparison distribution qualitative variable formula excel

Khi² is compared to limit value supplied in the Khi² table with (Nb of modalities - 1) degrees of freedom with a p-level (risk to make a mistake) < 5%. If Khi² > Khi²lim, we conclude ther is a difference in distribution of the qualitative variable between the 2 samples.

Another solution if contingency table contains only 4 cases (2 x 2) and if theoretical effectives are lower than 5 is to calculate the exact Fisher's test that supply the exact probability to obtain a higher difference between proportions of the modalities from the 2 samples. If this probability is too weak, we reject null hypothesis.

If the 2 studied series concern only one group of subjects and if the studied qualitative variable displays only 2 modalities, then StatEL practices the test of McNemar. If the studied qualitative variable contains more than 2 modalities (ex: "more", "equal", "less"), then StatEL practices the test of Stuart-Maxwell and the test of Bhapkar which offers a higher precision. Implied algorithms are based on matricial formulas, their details are not displayed here.

2 - Launch of comparaison test for distribution of a qualitative variable :

StatEL allows you to work with different kind of data :

raw data :