What is the information in the DIF analysis output?

There are several columns in the DIF analysis output. The first column “Item” is the name of the item. The second and third columns are the Mantel-Haesnzel chi-square and associated p-value. Note that jMetrik uses the Cochran-Mantel-Haenszel for stratified 2 x k tables. It is a generalization of the Mantel-Haenszel for stratified 2 x 2 tables and it works with binary and polytomous items. One difference between the Cochran-Mantel-Haesnzel for binary items and the Mantel-Haenszel is that the former does not use a correction for continuity. Therefore, with binary items the Mantel-Haenszel value reported by jMetrik may slightly differ from the Mantel-Haenszel reported by other programs.

The “Valid N” column lists the number of examinees involved in the Mantel-Haenszel statistic. Examinees that failed to provide a value for the DIF group code are deleted from the analysis. Also, any stratum table with only one examinee is not included in the analysis. As such, you could have several examinees eliminated from the analysis if you have several tables with only a single examinee. In this situation, try using the “Deciles” or “quntiles” options to preserve more data.

The “E.S. (95% C. I.)” columns list the effect size (E.S.) and 95% confidence interval for the effect size, respectively. For binary items, the effect size is the common odds ratio (COR). You can optionally convert this value to the ETS Delta metric. For polytomous items, the effect size is the standardized P-DIF statistic (sP-DIF).

The final column (“Class”) is the ETS DIF classification level. The possible classifications for binary items are A, B, and C, while the possible classification levels for polytomous items are AA, BB, and CC. A and AA items show little to no DIF. B and BB suggest moderate amounts of DIF. C and CC items suggest a large amount of DIF. These classifications are a function of statistical and practical significance. The rules for binary items are:

A item: (a) Chi-square p-value > 0.05 or (b) the COR is strictly between 0.65 and 1.53.
B item: not and A or C item
C item: (a) COR < 0.53 AND the upper bound of the 95% confidence interval for the COR is less than 0.65, or (b) COR > 1.89 AND the lower bound for the 95% confidence interval for the COR is greater than 1.53.

Given that polytomous items use a different effect size, the classificaiton rules are different. The rules for polytomous items are are based on dividing sP-DIF (the value in the output) by the item score range to limit values to the interval -1 to 1. Then, take the absolute value to limit the interval to 0 to 1. Call this new value sP-DIF*. It is not displayed in the output. This change allows the rules developed for binary item P-DIF to be applied to polytomous items. The rules for polytomous items are:

AA item: sP-DIF* < 0.05
BB item: 0.05 >= sP-DIF* <0.10
CC item: sP-DIF* >= 0.10

Each DIF classification also includes a sign. A “+” sign (without the quotes) indicates that the item favors the focal group. A “-” indicates that the item favors the reference group. More information about the DIf classification levels are available in the following articles.

Dorans, N. J., Schmitt, A. P., & Bleistein, C. A. (1992). The standardization approach to assessing comprehensive differential item functioning. Journal of Educational Measurement, 29, 309-319.

Potenza, M. T., & Dorans, N. J. (1995). DIF assessment for polytomously scored items: A framework for classification and evaluation, Applied Psychological Measurement, 19, 23-37.

Zwick, R., & Ercikan, K. (1989). Analysis of differential item functioning in the NAEP history assessment. Journal of Educational Measurement, 26, 55-66.