1 - Principles of comparaison test for distribution of a qualitative variable :
We have 2 samples a and b whose proportions of different modalities of a qualitative variable are estimators of proportions of different modalitiesfor the same qualitative variable in the 2 populations A and B from where are extracted the 2 analyzed samples. The question is : if proportions are different between the 2 samples (that is easy to check), is it the same for the 2 populations ?
Another way to question is to ask if the 2 samples are extracted from only one population or from 2 different populations.
- Null hypothesis : "H0 = pa and pb (proportions of studied parameter in both populations) are the same"
- Alternative hypothesis : "H1 = pa and pb are different"
Principle of this test is based sur on calculation of Khi² (or Chi²), which consists in comparaison of observed effectives from the contingency table, with theoretical effectives, calculated under hypothesis of equality of proportions between the 2 populations.
Each theoretical effective is calculed by multiplying sum of the row with sum of the column and divided by total effective of the contingency table.
- Khi² is calculated as below, only if each theoretical effective is at least equal to 5 :
- If contingency table contains only 4 cases (2 x 2) and if theoretical effective are lower than 5 (but at least equal to 3), we may apply Yates correction :
- Khi² is compared to limit value supplied in the Khi² table with (Nb of modalities - 1) degrees of freedom with a p-level (risk to make a mistake) < 5%. If Khi² > Khi²lim, we conclude ther is a difference in distribution of the qualitative variable between the 2 samples.
- Another solution if contingency table contains only 4 cases (2 x 2) and if theoretical effectives are lower than 5 is to calculate the exact Fisher's test that supply the exact probability to obtain a higher difference between proportions of the modalities from the 2 samples. If this probability is too weak, we reject null hypothesis.
- If the 2 studied series concern only one group of subjects and if the studied qualitative variable displays only 2 modalities, then StatEL practices the test of McNemar. If the studied qualitative variable contains more than 2 modalities (ex: "more", "equal", "less"), then StatEL practices the test of Stuart-Maxwell and the test of Bhapkar which offers a higher precision. Implied algorithms are based on matricial formulas, their details are not displayed here.
By default, modalities of the coding variable are "0" and "1". You can remove them by selecting them in the "List of modalities" and by clicking on the "Remove the modality" button. Then, specify one by one each modality of the coding variable by clicking on "Add a modality".
In the mentioned example, StatEL explains clearely that the 2 studied groups have different distributions with a p-value (risk to make a mistake) lower than 0.01 (i.e. 1%).
At last, a histogram graph displays repartitions of different modalities :