The competent Computational Thinking test (cCTt): a valid, reliable and
gender-fair test for longitudinal CT studies in grades 3-6
- URL: http://arxiv.org/abs/2305.19526v1
- Date: Wed, 31 May 2023 03:29:04 GMT
- Title: The competent Computational Thinking test (cCTt): a valid, reliable and
gender-fair test for longitudinal CT studies in grades 3-6
- Authors: Laila El-Hamamsy, Mar\'ia Zapata-C\'aceres, Estefan\'ia
Mart\'in-Barroso, Francesco Mondada, Jessica Dehler Zufferey, Barbara Bruno,
Marcos Rom\'an-Gonz\'alez
- Abstract summary: This study investigated whether the competent Computational Thinking test (cCTt) could evaluate learning reliably from grades 3 to 6 (ages 7-11)
The findings indicate that the cCTt is valid, reliable and gender-fair for grades 3-6, although more complex items would be beneficial for grades 5-6.
- Score: 0.7896843467339624
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The introduction of computing education into curricula worldwide requires
multi-year assessments to evaluate the long-term impact on learning. However,
no single Computational Thinking (CT) assessment spans primary school, and no
group of CT assessments provides a means of transitioning between instruments.
This study therefore investigated whether the competent CT test (cCTt) could
evaluate learning reliably from grades 3 to 6 (ages 7-11) using data from 2709
students. The psychometric analysis employed Classical Test Theory, normalised
z-scoring, Item Response Theory, including Differential Item Functioning and
PISA's methodology to establish proficiency levels. The findings indicate that
the cCTt is valid, reliable and gender-fair for grades 3-6, although more
complex items would be beneficial for grades 5-6. Grade-specific proficiency
levels are provided to help tailor interventions, with a normalised scoring
system to compare students across and between grades, and help establish
transitions between instruments. To improve the utility of CT assessments among
researchers, educators and practitioners, the findings emphasise the importance
of i) developing and validating gender-fair, grade-specific, instruments
aligned with students' cognitive maturation, and providing ii) proficiency
levels, and iii) equivalency scales to transition between assessments. To
conclude, the study provides insight into the design of longitudinal
developmentally appropriate assessments and interventions.
Related papers
- NLP and Education: using semantic similarity to evaluate filled gaps in a large-scale Cloze test in the classroom [0.0]
Using data from Cloze tests administered to students in Brazil, WE models for Brazilian Portuguese (PT-BR) were employed to measure semantic similarity.
A comparative analysis between the WE models' scores and the judges' evaluations revealed that GloVe was the most effective model.
arXiv Detail & Related papers (2024-11-02T15:22:26Z) - CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design [15.2100541345819]
CTBench is introduced as a benchmark to assess language models (LMs) in aiding clinical study design.
It consists of two datasets: "CT-Repo," containing baseline features from 1,690 clinical trials sourced from clinicaltrials.gov, and "CT-Pub," a subset of 100 trials with more comprehensive baseline features gathered from relevant publications.
arXiv Detail & Related papers (2024-06-25T18:52:48Z) - Wearable Device-Based Real-Time Monitoring of Physiological Signals: Evaluating Cognitive Load Across Different Tasks [6.673424334358673]
This study employs cutting-edge wearable monitoring technology to conduct cognitive load assessment on electroencephalogram (EEG) data of secondary vocational students.
The research delves into their application value in assessing cognitive load among secondary vocational students and their utility across various tasks.
arXiv Detail & Related papers (2024-06-11T10:48:26Z) - ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models [53.00812898384698]
We argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking.
We highlight how cognitive biases can conflate fluent information and truthfulness, and how cognitive uncertainty affects the reliability of rating scores such as Likert.
We propose the ConSiDERS-The-Human evaluation framework consisting of 6 pillars -- Consistency, Scoring Criteria, Differentiating, User Experience, Responsible, and Scalability.
arXiv Detail & Related papers (2024-05-28T22:45:28Z) - Survey of Computerized Adaptive Testing: A Machine Learning Perspective [66.26687542572974]
Computerized Adaptive Testing (CAT) provides an efficient and tailored method for assessing the proficiency of examinees.
This paper aims to provide a machine learning-focused survey on CAT, presenting a fresh perspective on this adaptive testing method.
arXiv Detail & Related papers (2024-03-31T15:09:47Z) - Analyzing-Evaluating-Creating: Assessing Computational Thinking and Problem Solving in Visual Programming Domains [21.14335914575035]
Computational thinking (CT) and problem-solving skills are increasingly integrated into K-8 school curricula worldwide.
We have developed ACE, a novel test focusing on the three higher cognitive levels in Bloom's taxonomy.
We evaluate the psychometric properties of ACE through a study conducted with 371 students in grades 3-7 from 10 schools.
arXiv Detail & Related papers (2024-03-18T20:18:34Z) - AutoTrial: Prompting Language Models for Clinical Trial Design [53.630479619856516]
We present a method named AutoTrial to aid the design of clinical eligibility criteria using language models.
Experiments on over 70K clinical trials verify that AutoTrial generates high-quality criteria texts.
arXiv Detail & Related papers (2023-05-19T01:04:16Z) - Learning to diagnose cirrhosis from radiological and histological labels
with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset.
We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis.
This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z) - The competent Computational Thinking test (cCTt): Development and
validation of an unplugged Computational Thinking test for upper primary
school [0.8367620276482053]
The competent CT test (cCTt) is an unplugged CT test targeting 7-9 year-old students.
The expert evaluation indicates that the cCTt shows good face, construct, and content validity.
The psychometric analysis of the student data demonstrates adequate reliability, difficulty, and discriminability.
arXiv Detail & Related papers (2022-03-11T15:05:35Z) - Opportunities of a Machine Learning-based Decision Support System for
Stroke Rehabilitation Assessment [64.52563354823711]
Rehabilitation assessment is critical to determine an adequate intervention for a patient.
Current practices of assessment mainly rely on therapist's experience, and assessment is infrequently executed due to the limited availability of a therapist.
We developed an intelligent decision support system that can identify salient features of assessment using reinforcement learning.
arXiv Detail & Related papers (2020-02-27T17:04:07Z) - Interpretable Off-Policy Evaluation in Reinforcement Learning by
Highlighting Influential Transitions [48.91284724066349]
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education.
Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding.
We develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates.
arXiv Detail & Related papers (2020-02-10T00:26:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.