Differential Item Functioning in the PISA Project: Detection and Understanding

Authors

  • Paula Elosua Euskal Herriko Unibertsitatea- Universidad del País Vasco

DOI:

https://doi.org/10.7203/relieve.12.2.4229

Keywords:

Differential Item Functioning, PISA, Mantel-Haenszel, Logistic Regression, Polytomous DIF, Test adaptation

Abstract

This report analyses the differential item functioning (DIF) in the Programme for Indicators of Student Achievement PISA2000. The items studied are coming from the Reading Comprehension Test. We analyzed the released items from this year because we wanted to join the detection of DIF and its understanding. The reference group is the sample of United Kingdom and the focal group is the Spanish sample. The procedures of detection are Mantel-Haenszel, Logistic Regression and the standardized mean difference, and their extensions for polytomous items. Two items were flagged and the post-hoc analysis didn’t explain the causes of DIF entirely.

References

Agresti, A. (1984). Analysis of ordinal categorical data. New York: Wiley and Sons.

Agresti, A. (1990). Categorical Data Analysis. New York: Wiley and Sons.

Allauf, A., Hambleton, R.K., y Sireci, S.G. (1999). Identifying the Causes of DIF in Translated Verbal Items. Journal of Educational Measurement, 36(3), 185-198

http://dx.doi.org/10.1111/j.1745-3984.1999.tb00553.x

Berk, R. A. (Ed.) (1982). Handbook of methods for detecting item bias. Baltimore, John Hopkins Universtity Press.

Camilli, G., y L. A. Shepard (1994). Methods for identifying biased test items. London, Sage.

Dorans, N.J., y Holland, P.W. (1993). DIF detection and description: Mantel-Haenszel and Standarization. En P.W.Holland y H.Wainer (Eds.) Differential Item Functioning (pp. 35-66) Hillsdale, NJ: Erlbaum

Dorans, N. J., y Kulick, E. (1986). Demonstrating the utility of the standarization approach to assessing unexpected differential item performance on the scholastic aptitude test. Journal of educational measurement 23(4), 355-368.

http://dx.doi.org/10.1111/j.1745-3984.1986.tb00255.x

Elosua, P., y Hambleton, R.K. (en prensa). Improving the Methodology for Detecting Biased Test Items. International Journal of Testing Elosua, P., y López, A. (1999). Funcionamiento diferencial de los ítems y sesgo en la adaptación de dos pruebas verbales. Psicológica, 20, 23-40.

Elosua, P., y López-Jauregui, A. (en prensa). Potential DIF sources in the adaptation of tests. International Journal of Testing.

Elosua, P., y López-Jauregui, A. (en prensa). Aplicación de cuatro procedimientos de detección del funcionamiento diferencial so-bre ítems politómicos. Psicothema.

Elosua, P., López, A., y Egaña, J. (2000). Idioma de aplicación y rendimiento en una prueba de comprensión verbal. Psicothema 12(2), 201-206.

Elosua, P., López, A., Ega-a, J., Artamendi, J. A., y Yenes, F. (2000). Funcionamiento diferencial de los ítems en la aplicación de pruebas psicológicas en entornos bilingües. Revista de Metodología de las Ciencias del Comportamiento, 2(1), 17-33.

Elosua, P., López, A., y Torres, E. (2000). Desarrollos didácticos y funcionamiento diferencial de los ítems. Problemas inherentes a toda investigación empírica sobre sesgo. Psicothema, 12(2), 198-202.

Ercikan, K. (2002). Disentangling Sources of Differential Item Functioning in Multilanguage Assessments. International Journal of Testing 2(3-4), 199-215.

http://dx.doi.org/10.1080/15305058.2002.9669493

Gierl, M.J., y McEwen, N. (1995). Differential Item Functioning on the Alberta Education Social Studies 30 Diploma Exams. Pa-per presented at the annual meeting of the Canadian Society for Studies in Education, Ottawa, Ontario, Canada.

Gierl, M. J., y Khaliq, S.N. (2001). Identifying Sources of Differential Item and Bundle Functioning on Translaten Achievement Tests: A Confirmatory Analysis. Journal of Educational Measurement 38(2), 164-187.

http://dx.doi.org/10.1111/j.1745-3984.2001.tb01121.x

Hambleton, R. K. (2001). The next genera-tion of the ITC test translation and adaptation guidelines. European Journal of Psychological Assessment, 17(3), 164-172.

http://dx.doi.org/10.1027//1015-5759.17.3.164

Hambleton, R. K., Merenda, P.F., y Spielberger, C.D. (Eds.) (2005). Adapting Edu-cational and Psychological Tests for Cross-Cultural Assessment. Mahwah, NJ, Lawrence Erlbaum Associates.

Hambleton, R.K., y Jones, R.W. (1994). Comparison of Empirical and Judgmental Procedures For Detecting Differential Item Functioning. Educational Research Quarterly, 18 (1), 21-37.

Hambleton, R. K., y Patsula, L. (1999). In-creasing the validity of Adapted tests: Myths to be avoided and guidelines for im-proving test adaptation practices. Journal of Applied testing Technology, 1(1).

Holland, P.W., y Thayer, D.T. (1988). Differential Item Performance and the Mantel-Haenszel procedure. En H. Wainer y H.J. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum

Hulin, C.L. (1987). A psychometric theory of evaluations of Item Scale Translations. Journal of Cross-Cultural Psychology, 18(2), 115-142.

http://dx.doi.org/10.1177/0022002187018002001

Hulin, C.L., y Mayer, L. (1986). Psychometric equivalence of a translation of the job descriptive index into Hebrew. Journal of Applied Psychology, 71(1), 83-94.

http://dx.doi.org/10.1037/0021-9010.71.1.83

INECSE (2005). Programa PISA. Pruebas de Comprensión Lectora. Madrid: INECSE

López, A., y Elosua, P. (2002). Análisis de contenido y funcionamiento diferencial del ítem en una prueba de aptitud numérica. Revista de Psicología General y Aplicada 55(3), 349-362.

Mantel, N., y Haenszel, W. M. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748.

Millsap, R. E., y Everson, H.T. (1993). Methodology review: statistical approaches for assessing measurement bias. Applied psychological measurement 17(4), 297-334.

http://dx.doi.org/10.1177/014662169301700401

Potenza, M. T., y Dorans, N.J. (1995). DIF assessment for polytomously scored items: a framework for classification and evaluation. Applied psychological measurement 19(1), 23-37.

http://dx.doi.org/10.1177/014662169501900104

Rogers, H.J., y Swaminathan, H. (1993) A comparison of the logistic regression and Mantel-Hanszel procedures for detecting Differential Item Functioning. Applied Psychological Measurement, 17(2), 105-117.

http://dx.doi.org/10.1177/014662169301700201

Shealy, R., y Stout, W. (1993). A model-based standarization approach that separates true bias/dif from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika 58(2), 159-194.

http://dx.doi.org/10.1007/BF02294572

Spray, J., y Miller, T. (1994). Identifying nonuniform DIF in polytomously scored test items (American College Testing Research Report Series 94-1). Iowa City, IA: American College Testing Program.

Swaminathan, H., y Rogers, H.J. (1990). Detecting differential item functioning us-ing logistic regression procedures. Journal of educational measurement 27(4), 361-370.

http://dx.doi.org/10.1111/j.1745-3984.1990.tb00754.x

Tian, F. (1999). Detecting differential item functioning in polytomous items. Un-published doctoral dissertation, Faculty of Education, University of Ottawa.

Thomas, D.R., y Zumbo, B.D. (1996). Variable importance in regression and related analysis. Paper presented at the Annual Meeting of the Psychometric Society, Banff, AB, Canada.

van de Vijver, F. J. R., y Tanzer, N. K. (1997). Bias and equivalence in cross-cultural assessment: An overview. Europe-an Review of Applied Psychology, 47(4), 263-279.

Zwick, R., Donogue J. R., y Grima, K.L. (1993). Assessment of Differential Item Functioning for Performance Tasks. Journal of Educational Measurement 30(3), 233-251.

http://dx.doi.org/10.1111/j.1745-3984.1993.tb00425.x

Zwick, R., y D. T. Thayer (1996). Evaluating the magnitude of differential item functioning in polytomous items Journal of educational and behavioral statistics 21(3), 187-201.

http://dx.doi.org/10.3102/10769986021003187

Issue

Section

Research Articles