Validity for Automatic Generation of Items for the Basic Competences Exam (Excoba)

Authors

  • María Fabiana Ferreyra Métrica Educativa
  • Eduardo Backhoff-Escudero National Institute for Education Evaluation (INEE) in Mexico

DOI:

https://doi.org/10.7203/relieve.22.1.8048

Keywords:

Automatic Item Generation, Educational Testing, Construct Validity, Factor Structure, Item Analysis

Abstract

Automatic Item Generation (AIG) is the process of designing and producing items for a test, as well as generating different versions of exams that are conceptually and statistically equivalent. Automatic Item Generation tools are developed with the assistance of information systems, which make these tools very efficient. Under this aim, GenerEx, an automatic item generation tool, was developed. GenerEx is used to automatically generate different versions of the Basic Competences Exam (Excoba). Even though AIG represents a great advance for the development of psychological and educational assessment, it is a methodological challenge to obtain evidence of validity of the enormous quantity of possible items and tests generated in an automatic process. This paper has the purpose of describing an approach to analyze the internal structure and the psychometric equivalence of exams generated by GenerEx and, additionally, to describe kinds of results obtained to reach this objective. The approach is based on the process for selecting samples from the generation tool, founded on the assumption that items and exams must be psychometrically equivalent. This work includes three kinds of conceptually different and complementary analysis: the Classical Test Theory, Item Response Theory and Confirmatory Factor Analysis. Results show that GenerEx produces psychometrically similar exams; however there are problems in some learning areas. The methodology was useful for obtaining a description about GenerEx’s psychometric functioning and the internal structure of two randomly generated versions of Excoba. Analysis can be complemented by a qualitative study of this item deficiencies.

Author Biographies

María Fabiana Ferreyra, Métrica Educativa

Mathematics teacher at the Instituto Nacional Superior del Profesorado Joaquín V. González, Buenos Aires, Argentina. She holds a master’s degree in Education Sciences and a Ph.D. in Education Sciences, both of which are from the Institute for Education Development and Research, part of the Universidad Autónoma de Baja California, Mexico. Her area of interest is the development and validation of large-scale learning tests, and teaching mathematics. She is currently a research associate at Métrica Educativa , A.C., Mexico . Her postal address is: Métrica Educativa, Alvarado 921, Zona Centro. Ensenada, Baja California, C.P. 22800 (México)

Eduardo Backhoff-Escudero, National Institute for Education Evaluation (INEE) in Mexico

He holds a bachelor’s degree in Psychology from the Universidad Nacional Autónoma de México, a master’s degree in Education from the University of Washington and a Ph.D. in Education from the Universidad Autónoma de Aguascalientes. His area of interest is the development and validation of large-scale learning tests and computer-aided assessment. He has been Director of Tests and Measuring at the National Institute for Education Evaluation (INEE) in Mexico. He is currently a Member of the Governing Board of INEE .

References

Backhoff, E. & Tirado, F. (1992). Desarrollo del Examen de Habilidades y Conocimientos Básicos. Revista de la Educación Superior, 21 (3), 95-118. Retrieved from http://www.metrica.edu.mx/fileadmin/user_upload/pdf/1992_Desarrollo_del_EXHCOBA.pdf

Backhoff, E., Ibarra, M. y Rosas, M. (1995). Sistema Computarizado de Exámenes (SICODEX). Revista Mexicana de Psicología, 12 (1), 55-62.

Bejar, I. I. (1993). A generative approach to psychological and educational measurement. En N. Frederikson, R. J. Mislevy & I. I. Bejar (Eds.). Test theory for a new generation of tests (pp. 323-359). Mahwah, NJ: Erlbaum.

Bejar, I. I. (2002). Generative testing: From conception to implementation. In S.H. Irvine & P.C. Kyllonen (Eds.), Item generation for test development (pp. 199-217). Mahwah, NY: Erlbaum.

Bentler, P. M. (2006). EQS 6 Structural Equations Program Manual. Encino, CA: Multivariate Software, Inc.

Embretson, S. E. (1999). Generating items during testing: psychometric issues and models. Psychometrika, 64 (4) 407-433. doi: http://dx.doi.org/10.1007/BF02294564

Ferreyra M. F. (2014). Metodología para analizar la estructura interna de un generador automático de reactivos (Tesis de doctorado no publicada). Universidad Autónoma de Baja California, Ensenada, Mexico.

Geerlings, H., Glass, C. A. W. & van der Linden, W. J. (2011). Modeling rule-based item generation. Psychometrika, 76 (2), 337-359. doi: http://dx.doi.org/10.1007/s11336-011-9204-x

Gierl, M. J. & Haladyna, T. M. (2012). Automatic item generation: an introduction. In M. J. Gierl & T. M. Haladyna (Eds.), Automatic item generation: Theory and practice (pp. 3-12). New York: Routledge.

Gierl, M. J. & Lai, H. (April, 2011). The Role of Item Models in Automatic Item Generation. Paper Presented at the Annual Meeting of the National Council on Measurement in Education. New Orleans, LA.

Gierl, M. J. & Lai, H. (2012). Using weak and theory to create item models for Automatic Item Generation: some practical guidelines with examples. In M. J. Gierl & T. M. Haladyna (Eds.). Automatic Item Generation: Theory and Practice. New York: Routledge.

Gierl, M. J., Zhou, J. & Alves, C. (2008). Developing a Taxonomy of Item Model Types to Promote Assessment Engineering. The Journal of Technology, Learning, and Assessment, (7) 2.

Glas, C. A. W. & van der Linden, W. J. (2003). Computarized adaptive Testing with item cloning. Applied Psychological Measurement, 27, 247-261. doi: http://dx.doi.org/10.1177/0146621603254291

Haladyna, T. M. (2012). Automatic item generation: A historical perspective. In M. J. Gierl & T. M. Haladyna (Eds.), Automatic item generation: Theory and practice (pp. 13-25). Nueva York: Routledge.

Haladyna, T. M. & Shindoll, R. R. (1989). Shells: A method for writing effective multiple-choice test items. Evaluation and the Health Professions, 12, 97-104. doi: http://dx.doi.org/10.1177/016327878901200106

Hively, W., Patterson, H. L. & Page, S. H. (1968). A “universe-defined” system for arithmetic achievement tests. Journal of Educational Measurement, 5, 275-290. doi: http://dx.doi.org/10.1111/j.1745-3984.1968.tb00639.x

Holling, H., Bertling, J. P. & Zeuch, N. (2009). Automatic item Generation for probability word problems. Studies in Educational Evaluation, 35, 71-76. doi: http://dx.doi.org/10.1016/j.stueduc.2009.10.004

Hombo, C. & Dresher, A. (2001). A simulation study of the impact of automatic item generation under NAEP-like data conditions. Paper presented at the annual meeting of the National Council on Measurement in Education, Seatle, Wa, EE. UU.

Linacre, J.M. (2010). Winsteps® (Version 3.70.0.2) [Computer Software]. Beaverton, Oregon: Winsteps.com

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47 (2), 149-174. doi: http://dx.doi.org/10.1007/BF02296272

Pérez-Morán, J. C. (2014). Análisis del aspecto sustantivo de la validez de constructo de una prueba de habilidades cuantitativas (Tesis de doctorado no publicada). Universidad Autónoma de Baja California, Ensenada, Mexico.

Rasch, G. (1961). On General Laws and the Meaning of Measurement in Psychology. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 4: Contributions to Biology and Problems of Medicine, 321-333. University of California Press: Berkeley, CA. Retrieved from http://projecteuclid.org/euclid.bsmsp/1200512895

Sinharay, S. & Johnson, M. (2012). Statistical modeling of Automatic Item Generation. In M. J Gierl & T. M. Haladyna (Eds.). Automatic Item Generation: Theory and Practice. N. Y., New York: Routledge.

SPSS Inc. (2008). SPSS Statistics for Windows, Version 17.0. Chicago: SPSS Inc.

Published

2016-02-16

Issue

Section

Research Articles