1 - Principles of Khi² test of association :
We wish to test hypothesis if, in one population, 2 qualitative variables are independent or linked, thanks to sample effectives (example : are color of eyes and color of hair linked ?)
- Null hypothesi : "H0 = both qualitative variables are independent"
- Alternative hypothesis : "H1 = both qualitative variables are linked"
Principle of this test is based sur on calculation of Khi² (or Chi²), which consists in comparaison of observed effectives from the contingency table, with theoretical effectives, calculated under hypothesis of independency of both qualitative variables.
We have a contingency table with c columns and l lines containing effectives of the variable 1 with c modalities and of the variable 2 with l modalities.
Each theoretical effective is calculed by multiplying sum of the row with sum of the column and divided by total effective of the contingency table.
- Khi² is calculated as below, only if each theoretical effective is at least equal to 5 :
- If contingency table contains only 4 cases (2 x 2) and if theoretical effective are lower than 5 (but at least equal to 3), we may apply Yates correction :
- Khi² is compared to limit value supplied in the Khi² table with (Nb of modalities - 1) degrees of freedom with a p-level (risk to make a mistake) < 5%. If Khi² > Khi²lim, we conclude ther is a difference in distribution of the qualitative variable between the 2 samples.
- Another solution if contingency table contains only 4 cases (2 x 2) and if theoretical effective are lower than 5 is to calculate the exact Fisher's test that supply the exact probability to obtain a higher difference between proportions of the modalities from the 2 samples. If this probability is too weak, we reject null hypothesis.
2 - Launch of Khi² test of association :
In these 2 options, StatEL requires you to specify different modalities of the 2 qualitative variables :
By default, modalities of the coding variable are "0" and "1". You can remove them by selecting them in the "List of modalities" and by clicking on the "Remove the modality" button. Then, specify one by one each modality of the coding a variable by clicking on "Add a modality".
In the mentioned example, StatEL explains clearely that the 2 qualitative variable are linked, with a p-value (risk to make a mistake) lower than 0,02 (i.e. 2%).
At last, graph displays repartitions of modalities of variable 1 according to modalities of variable 2 :