Effect of Sample Size and Test Length on Item Parameter Estimation Accuracy according to Generalized Partial Credit Model and the Delta Scoring Model

Authors

  • Dalal Abdulrahman Mohammed Al-Owaidi*
  • Prof. Dr. Ismail bin Salamah **Al-Bursan

DOI:

https://doi.org/10.53285/artsep.v7i2.2628

Keywords:

: Sample size effect, test length, accuracy of item parameter estimation, Generalized Partial Credit Model (GPCM) ,, Delta Scoring Model (DSM)

Abstract

This study aims to examine sample size and test length effects on item parameter estimation under the Generalized Partial Credit Model (GPCM) and the Delta Scoring Model (DSM). A comparative analytical approach based on simulation was employed, wherein simulated data were generated for varying sample sizes (500, 1000, 5000) and test lengths (5, 10, 15 items). The data were analyzed using R and Delta software. Estimation accuracy was evaluated using three indicators: bias, mean square error (MSE), and correlation coefficient. A three-way ANOVA was conducted using SPSS to assess statistically-significant of differences in estimation accuracy across study conditions and their interactions. Results demonstrated the superiority of the GPCM over the DSM in estimating response category difficulty and discrimination parameters accuracy. The model type was the only statistically significant factor affecting the accuracy of response category difficulty estimation, whereas neither sample size, test length, nor their interaction exhibited significant effects. For the discrimination parameter, both the model and test length showed significant effects, while sample size did not. Additionally, a statistically significant interaction was observed between sample size and test length.

References

بني عطا، ز. ص. إ. (2017). تقصي أثر طول الاختبار وحجم العينة على دقة طرق تقدير معالم الفقرات وقدرات الأفراد في برنامج بايلوج. المجلة الدولية للبحث في التربية وعلم النفس، 5(2)، 579–606.

دي أيالا، آر. (2017). النظرية والتطبيق في نظرية الاستجابة للفقرة. دار جامعة الملك سعود. https://doi.org/10.33948/1158-030-002-008

دودين، ح. م. (2009). التحليل الإحصائي المتقدم للبيانات باستخدام SPSS. عمّان: دار الحامد للنشر والتوزيع.

Arabic References

Banī ʻAṭā, Z. Ṣ. I. (2017). taqaṣṣī Athar Ṭūl al-ikhtibār wa-ḥajm al-ʻayyinah ʻalá diqqat Ṭuruq taqdīr Maʻālim al-faqarāt wqdrāt al-afrād fī Barnāmaj bāylwj. al-Majallah al-Dawlīyah lil-Baḥth fī al-Tarbiyah wa-ʻilm al-nafs, 5 (2), 579 – 606.

Dī ayālā, Ār. (2017). al-naẓarīyah wa-al-taṭbīq fī Naẓarīyat al-istijābah llfqrh. Dār Jāmiʻat al-Malik Saʻūd. https : / / doi. org / 10. 33948/1158-030-002-008

Dūdīn, Ḥ. M. (2009). al-Taḥlīl al-iḥṣāʼī almtqdm llbyānāt bi-istikhdām SPSS. ʻAmmān : Dār al-Ḥāmid lil-Nashr wa-al-Tawzīʻ.

ثانياَ:المراجع الإنجليزية:

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561–573. https://doi.org/10.1007/BF03037732

Auné, S. E., Abal, F. J. P., & Attorresi, H. F. (2020). A psychometric analysis from the Item Response Theory: Step-by-step modelling of a Loneliness Scale. Ciencias Psicológicas, 14(1), e-2179. https://doi.org/10.22235/cp.v14i1.2179 (تم تعديل الترتيب)

Bock, R. D., & Lieberman, M. (1970). Fitting a response model for dichotomously scored items. Psychometrika, 35(2), 179–197. https://doi.org/10.1007/BF02291262

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.

Dai, S., Vo, T. T., Kehinde, O. J., He, H., Xue, Y., Demir, C., & Wang, X. (2021). Performance of Polytomous IRT Models With Rating Scale Data: An Investigation Over Sample Size, Instrument Length, and Missing Data. Frontiers in Education, 6. https://doi.org/10.3389/feduc.2021.721963

Dimitrov, D. M. (2016). An approach to scoring and equating tests with binary items: Piloting with large-scale assessments. Educational and Psychological Measurement, 76(6), 954–975. https://doi.org/10.1177/0013164416631100

Dimitrov, D. M., & Alsadaawi, A. (2018). Psychometric features of the General Teacher Test under the D-scoring model: The case of teacher certification assessment in Saudi Arabia. World Journal of Social Science Research, 5(2), 107–122. https://doi.org/10.22158/wjssr.v5n2p107

Dimitrov, D. M., & Atanasov, D. V. (2021). Latent D-scoring modeling: Estimation of item and person parameters. Educational and Psychological Measurement, 81(2), 388–404. https://doi.org/10.1177/0013164420941147

Dimitrov, D. M., & Luo, Y. (2019). A note on the D-scoring method adapted for polytomous test items. Educational and Psychological Measurement, 79(3), 545–557. https://doi.org/10.1177/0013164418786014

Dimitrov, D. M., Atanasov, D. V., & Luo, Y. (2020). Person-fit assessment under the D-scoring method. Measurement: Interdisciplinary Research and Perspectives, 18(3), 111–123. https://doi.org/10.1080/15366367.2020.1725733

Djidu, H , Heri Retnawati, H & Haryanto H. (2023). Ensuring Parameter Estimation Accuracy in 3PL IRT Modeling: The Role of Test Length and Sample Size. JP3I (Jurnal Pengukuran Psikologi Dan Pendidikan Indonesia), 12(2), 177–190. https://doi.org/10.15408/jp3i.v12i2.34130

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates Publishers.

Han, Z., Sinharay, S., Johnson, M. S., & Liu, X. (2023). The standardized S-X2 statistic for assessing item fit. Applied Psychological Measurement, 47(1), 3–18. https://doi.org/10.1177/01466216221108077

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Kluwer-Nijhoff Publishing.

Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology, 7, Article 109. https://doi.org/10.3389/fpsyg.2016.00109

Lord, F. M. (1952). A theory of test scores (Psychometric Monograph No. 7). Psychometric Society.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. https://doi.org/10.1007/BF02296272

Mills, C. N. (2002). Computerized Simulation in Research and Testing. Applied Psychological Measurement, 26(3), 217–231. https://doi.org/10.1177/0146621602026003003

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. https://doi.org/10.1177/014662169201600206

Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50–64. https://doi.org/10.1177/01466216000241003

Şahin, A., & Anıl, D. (2017). The effects of test length and sample size on item parameters in item response theory. Kuram ve Uygulamada Eğitim Bilimleri, 17(1), 321–335. https://doi.org/10.12738/estp.2017.1.0270

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometrika Monograph Supplement No. 17). Psychometric Society.

Shen, L. (1997, March). Quantifying item dependency by Fisher’s Z. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, IL.

Downloads

Published

2025-06-04

How to Cite

Al-Owaidi*, D. A. M., & **Al-Bursan, P. D. I. bin S. (2025). Effect of Sample Size and Test Length on Item Parameter Estimation Accuracy according to Generalized Partial Credit Model and the Delta Scoring Model. Arts for Educational & Psychological Studies, 7(2), 89-125. https://doi.org/10.53285/artsep.v7i2.2628

Similar Articles

1-10 of 129

You may also start an advanced similarity search for this article.