أثر حجم العينة وطول المقياس وتوازن المجموعات على أداء طريقة مانتل–هانزل العامة (GMH) في كشف الأداء التفاضلي للفقرات متدرجة الاستجابة: دراسة محاكاة

ماجد محمود شريف الجودة*

doi:10.53285/artsep.v7i4.2964

المؤلفون

ماجد محمود شريف الجودة*

DOI:

https://doi.org/10.53285/artsep.v7i4.2964

الكلمات المفتاحية:

مانتل–هانزل العامة (GMH)، الأداء التفاضلي للفقرة (DIF)، نموذج الاستجابة المتدرجة (GRM)، مقاييس متدرجة الاستجابة.

الملخص

تتحقّق هذه الدراسة من كفاءة طريقة مانتل–هانزل العامة (GMH) في كشف الأداء التفاضلي للفقرة (DIF) في المقاييس متدرجة الاستجابة، من خلال الاعتماد على أسلوب محاكاة مونتِ كارلو لنموذج الاستجابة المتدرجة (الفقرات متدرجة الاستجابة) (GRM) مع تغيير عوامل تصميمية أساسية: حجم العينة، طول الاختبار، توازن أحجام المجموعتين، نسبة الفقرات ذات الأداء التفاضلي، ونوع الأداء التفاضلي وشدته. أظهرت النتائج أن قوة الكشف ترتفع كلما زادت شدّة الـ DIF، وأن GMH أكثر حساسية للـDIF المنتظم مقارنة بغير المنتظم، مع بقاء معدل الخطأ من النوع الأول قريبًا من المستوى الاسمي. كما أن عدم توازن المجموعات يضعف القوة، بينما يرفعها طول الاختبار وحجم العينة. عمليًا: عندما يكون الاختبار أطول والمجموعتان متقاربتين عددًا والحجم الكلي كافٍ، تعمل GMH جيدًا؛ وعند الاشتباه بوجود DIF غير منتظم يُستحسن زيادة العينة واستخدام أساليب داعمة (مثل IRT-LR أو MIMIC) لتعزيز عدالة القياس ودقته.

المراجع

Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29(1), 67-91. https://doi.org/10.1111/j.1745-3984.1992.tb00368.x

Aljodudeh, M. (2021). Item response theory likelihood ratio test performance for deducting DIF items in different levels in samples sizes and different levels of DIF items. Vidyabharati International Interdisciplinary Research Journal 13 (1), 392-399

American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (2014). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association.

Arikan, C. A., Ugurlu S. , & Atar, B. (2016). A DIF and Bias Study by using MIMIC, SIBTEST, Logistic Regression and Mantel-Haenszel Methods. Journal of Education, 31(1), 34-52.

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

Cambridge Psychometrics Centre. (2014). Session 4: Overview of polytomous IRT models (GRM thresholds & discrimination).

Elyan, R. M. ., & Al jodeh, M. M. . (2024). The Effectiveness of Mantel Haenszel Log Odds Ratio Method in Detecting Differential Item Functioning Across Different Sample Sizes and Test Lengths Using Real Data Analysis. Dirasat: Educational Sciences, 51(3), 37–46. https://doi.org/10.35516/edu.v51i3.6755

Eom, M. (2008). Underlying factors of MELAB listening construct. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 6, 77–94.

Fidalgo, A. M., & Madeira, J. M. (2008). Generalized Mantel-Haenszel methods for differential item functioning detection. Educational and Psychological Measurement, 68(6), 940-958

Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29(4), 278-295.

Finch, W. H. (2022). The Impact and Detection of Uniform Differential Item Functioning. Frontiers in Education (PMC).

Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel–Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Lawrence Erlbaum Associates. https://doi.org/10.4324/9780203056905-12

Jafari, P., Bagheri, Z., Hashemi, S. Z., & Shalileh, K. (2013). Assessing whether parents and children perceive the meaning of the items in the PedsQLTM 4.0 quality of life instrument consistently: a differential item functioning analysis. Global Journal of Health Science, 5(5), 80 – 88.

Kabasakala, K., Arsan, N., Gok, B., & Kelecooglu, H. (2014). Comparing Performances (Type I error and Power) of IRT Likelihood Ratio SIBTEST and Mantel-Haenszel Methods in the Determination of Differential Item Functioning. Educational Sciences: Theory & Practice, 14(6), 2186-2193.

Mellenbergh, G. J. (1989). Item bias and item response theory. International journal of educational research, 13(2), 127-143.

Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied psychological measurement, 17(4), 297-334. https://doi.org/10.1177/014662169301700401

Narayanon, P., & Swaminathan, H. (1996). Identification of items that show nonuniform DIF. Applied psychological measurement, 20(3), 257-274. https://doi.org/10.1177/014662169602000306

Park, G.(2008). Differential Item Functioning on an English Listening Test across Gender. TESOL Quarterly,42(1), pp. 115-123

Penfield, R. D. (2001). Assessing differential item functioning among multiple groups: a comparison of three Mantel-Haenszel procedures. Appl. Meas. Educ. 14,(3) 235–259. doi: 10.1207/ S15324818AME1403_3

Penfield, R. D. (2010). Distinguishing between net and global DIF in polytomously scored items. Journal of Educational Measurement, 47(1), 129–149. https://doi.org/10.1111/j.1745-3984.2010.00105.x

Penfield, R. D., Gattamorta, K. A., & Childs, R. A. (2009). An NCME instructional module on using differential step functioning to refine the analysis of DIF in polytomous items. Educational Measurement: Issues and Practice, 28(1), 38–49. https://doi.org/10.1111/j.1745-3992.2009.01135.x

R Core Team. (2025). R: A language and environment for statistical computing (Version 4.4.3). R Foundation for Statistical Computing. https://www.r-project.org/

Su, Y. H., & Wang, W. C. (2005). Efficiency of the Mantel, Generalized Mantel–Haenszel, and Logistic Discriminant Function Analysis Methods in Detecting Differential Item Functioning for Polytomous Items. Applied Measurement in Education, 18(4), 313–350. https://doi.org/10.1207/s15324818ame1804_1

Thissen, D. (2001). IRTLRDIF v.2.0b: Software for the computation of the statistics involved in Item Response Theory Likelihood-Ratio tests for Differential Item Functioning. L.L. Thurstone Psychometric Laboratory, University of North Carolina, Chapel Hill, NC.

Ugurlu, S. & Atar, B. (2020). Performances of MIMIC and logistic regression procedures in detecting DIF. Journal of Measurement and Evaluation in Education and Psychology, 11(1), 1-12.

Vahid A., Christine C. & Lee O. (2011). An Investigation of Differential Item Functioning in the MELAB Listening Test. Language Assessment Quarterly, 8, 361–385. DOI:10.1080/15434303.2011.628632

Wagner, A. (2004). A construct validation study of the extended listening sections of the ECRE and MELAB. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 2, 1–23.

Woods, C. (2009). Evaluation of MIMIC-model methods for DIF testing with comparison to two-group analysis. Multivariate Behavioral Research, 44(1), 1-27