أثر حجم العينة وطول الاختبار على دقة تقدير معالم الفقرات وفقًا لنموذج التقدير الجزئي المعمم ونموذج دلتا لتقدير الدرجات

دلال عبد الرحمن محمد  العويدي*; أ.د إسماعيل بن سلامة البرصان**

doi:10.53285/artsep.v7i2.2628

المؤلفون

دلال عبد الرحمن محمد العويدي*
أ.د إسماعيل بن سلامة البرصان**

DOI:

https://doi.org/10.53285/artsep.v7i2.2628

الكلمات المفتاحية:

أثر حجم العينة، طول الاختبار، دقة تقدير معلم الفقرات، نموذج التقدير الجزئي المعمم، نموذج دليا لتقدير الدرجات

الملخص

تهدف الدراسة إلى فحص أثر كل من حجم العينة وطول الاختبار على دقة تقدير معالم الفقرات ضمن نموذج التقدير الجزئي المعمم (GPCM)، ونموذج دلتا لتقدير الدرجات (DSM)، تم استخدام المنهج التجريبي على المحاكاة، إذ تم توليد بيانات افتراضية لأحجام عينات متفاوتة (500، 1000، 5000)، واختبارات بأطوال (5، 10، 15) فقرات، وتم تحليل البيانات باستخدام برنامجي (R وDelta)، كما تم تقييم دقة تقدير المعالم وفق ثلاثة مؤشرات: التحيز، متوسط مربع الخطأ، ومعامل الارتباط. كما أجري تحليل التباين الثلاثي (Three Way ANOVA) باستخدام (SPSS) لدلالة الفروق في دقة التقدير لكل ظرف من ظروف الدراسة والتفاعل بينهما؛ وقد أظهرت النتائج تفوّق نموذج (GPCM) على (DSM) في دقة تقدير صعوبة فئات الاستجابة ومعلم التمييز. كما بيّنت النتائج أن نوع النموذج كان العامل الوحيد ذا الأثر الدال إحصائيًا على دقة تقدير صعوبة فئات الاستجابة، في حين لم يكن لحجم العينة وطول الاختبار أو التفاعل بينهما أثر دال. أما بالنسبة لمعلم التمييز، فكان للنموذج وطول الاختبار أثر دال، بينما لم يظهر حجم العينة تأثيرًا دالًا. كما تبين وجود دلالة للتفاعل بين حجم العينة وطول الاختبار فقط.

المراجع

بني عطا، ز. ص. إ. (2017). تقصي أثر طول الاختبار وحجم العينة على دقة طرق تقدير معالم الفقرات وقدرات الأفراد في برنامج بايلوج. المجلة الدولية للبحث في التربية وعلم النفس، 5(2)، 579–606.

دي أيالا، آر. (2017). النظرية والتطبيق في نظرية الاستجابة للفقرة. دار جامعة الملك سعود. https://doi.org/10.33948/1158-030-002-008

دودين، ح. م. (2009). التحليل الإحصائي المتقدم للبيانات باستخدام SPSS. عمّان: دار الحامد للنشر والتوزيع.

Arabic References

Banī ʻAṭā, Z. Ṣ. I. (2017). taqaṣṣī Athar Ṭūl al-ikhtibār wa-ḥajm al-ʻayyinah ʻalá diqqat Ṭuruq taqdīr Maʻālim al-faqarāt wqdrāt al-afrād fī Barnāmaj bāylwj. al-Majallah al-Dawlīyah lil-Baḥth fī al-Tarbiyah wa-ʻilm al-nafs, 5 (2), 579 – 606.

Dī ayālā, Ār. (2017). al-naẓarīyah wa-al-taṭbīq fī Naẓarīyat al-istijābah llfqrh. Dār Jāmiʻat al-Malik Saʻūd. https : / / doi. org / 10. 33948/1158-030-002-008

Dūdīn, Ḥ. M. (2009). al-Taḥlīl al-iḥṣāʼī almtqdm llbyānāt bi-istikhdām SPSS. ʻAmmān : Dār al-Ḥāmid lil-Nashr wa-al-Tawzīʻ.

ثانياَ:المراجع الإنجليزية:

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561–573. https://doi.org/10.1007/BF03037732

Auné, S. E., Abal, F. J. P., & Attorresi, H. F. (2020). A psychometric analysis from the Item Response Theory: Step-by-step modelling of a Loneliness Scale. Ciencias Psicológicas, 14(1), e-2179. https://doi.org/10.22235/cp.v14i1.2179 (تم تعديل الترتيب)

Bock, R. D., & Lieberman, M. (1970). Fitting a response model for dichotomously scored items. Psychometrika, 35(2), 179–197. https://doi.org/10.1007/BF02291262

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.

Dai, S., Vo, T. T., Kehinde, O. J., He, H., Xue, Y., Demir, C., & Wang, X. (2021). Performance of Polytomous IRT Models With Rating Scale Data: An Investigation Over Sample Size, Instrument Length, and Missing Data. Frontiers in Education, 6. https://doi.org/10.3389/feduc.2021.721963

Dimitrov, D. M. (2016). An approach to scoring and equating tests with binary items: Piloting with large-scale assessments. Educational and Psychological Measurement, 76(6), 954–975. https://doi.org/10.1177/0013164416631100

Dimitrov, D. M., & Alsadaawi, A. (2018). Psychometric features of the General Teacher Test under the D-scoring model: The case of teacher certification assessment in Saudi Arabia. World Journal of Social Science Research, 5(2), 107–122. https://doi.org/10.22158/wjssr.v5n2p107

Dimitrov, D. M., & Atanasov, D. V. (2021). Latent D-scoring modeling: Estimation of item and person parameters. Educational and Psychological Measurement, 81(2), 388–404. https://doi.org/10.1177/0013164420941147

Dimitrov, D. M., & Luo, Y. (2019). A note on the D-scoring method adapted for polytomous test items. Educational and Psychological Measurement, 79(3), 545–557. https://doi.org/10.1177/0013164418786014

Dimitrov, D. M., Atanasov, D. V., & Luo, Y. (2020). Person-fit assessment under the D-scoring method. Measurement: Interdisciplinary Research and Perspectives, 18(3), 111–123. https://doi.org/10.1080/15366367.2020.1725733

Djidu, H , Heri Retnawati, H & Haryanto H. (2023). Ensuring Parameter Estimation Accuracy in 3PL IRT Modeling: The Role of Test Length and Sample Size. JP3I (Jurnal Pengukuran Psikologi Dan Pendidikan Indonesia), 12(2), 177–190. https://doi.org/10.15408/jp3i.v12i2.34130

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates Publishers.

Han, Z., Sinharay, S., Johnson, M. S., & Liu, X. (2023). The standardized S-X2 statistic for assessing item fit. Applied Psychological Measurement, 47(1), 3–18. https://doi.org/10.1177/01466216221108077

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Kluwer-Nijhoff Publishing.

Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology, 7, Article 109. https://doi.org/10.3389/fpsyg.2016.00109

Lord, F. M. (1952). A theory of test scores (Psychometric Monograph No. 7). Psychometric Society.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. https://doi.org/10.1007/BF02296272

Mills, C. N. (2002). Computerized Simulation in Research and Testing. Applied Psychological Measurement, 26(3), 217–231. https://doi.org/10.1177/0146621602026003003

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. https://doi.org/10.1177/014662169201600206

Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50–64. https://doi.org/10.1177/01466216000241003

Şahin, A., & Anıl, D. (2017). The effects of test length and sample size on item parameters in item response theory. Kuram ve Uygulamada Eğitim Bilimleri, 17(1), 321–335. https://doi.org/10.12738/estp.2017.1.0270

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometrika Monograph Supplement No. 17). Psychometric Society.

Shen, L. (1997, March). Quantifying item dependency by Fisher’s Z. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, IL.