Analyzing-Evaluating-Creating: Assessing Computational Thinking and Problem Solving in Visual Programming Domains
- URL: http://arxiv.org/abs/2403.12227v1
- Date: Mon, 18 Mar 2024 20:18:34 GMT
- Title: Analyzing-Evaluating-Creating: Assessing Computational Thinking and Problem Solving in Visual Programming Domains
- Authors: Ahana Ghosh, Liina Malva, Adish Singla,
- Abstract summary: Computational thinking (CT) and problem-solving skills are increasingly integrated into K-8 school curricula worldwide.
We have developed ACE, a novel test focusing on the three higher cognitive levels in Bloom's taxonomy.
We evaluate the psychometric properties of ACE through a study conducted with 371 students in grades 3-7 from 10 schools.
- Score: 21.14335914575035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computational thinking (CT) and problem-solving skills are increasingly integrated into K-8 school curricula worldwide. Consequently, there is a growing need to develop reliable assessments for measuring students' proficiency in these skills. Recent works have proposed tests for assessing these skills across various CT concepts and practices, in particular, based on multi-choice items enabling psychometric validation and usage in large-scale studies. Despite their practical relevance, these tests are limited in how they measure students' computational creativity, a crucial ability when applying CT and problem solving in real-world settings. In our work, we have developed ACE, a novel test focusing on the three higher cognitive levels in Bloom's Taxonomy, i.e., Analyze, Evaluate, and Create. ACE comprises a diverse set of 7x3 multi-choice items spanning these three levels, grounded in elementary block-based visual programming. We evaluate the psychometric properties of ACE through a study conducted with 371 students in grades 3-7 from 10 schools. Based on several psychometric analysis frameworks, our results confirm the reliability and validity of ACE. Our study also shows a positive correlation between students' performance on ACE and performance on Hour of Code: Maze Challenge by Code.org.
Related papers
- CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy [67.23830698947637]
We propose a new benchmark, CBT-BENCH, for the systematic evaluation of cognitive behavioral therapy (CBT) assistance.
We include three levels of tasks in CBT-BENCH: I: Basic CBT knowledge acquisition, with the task of multiple-choice questions; II: Cognitive model understanding, with the tasks of cognitive distortion classification, primary core belief classification, and fine-grained core belief classification; III: Therapeutic response generation, with the task of generating responses to patient speech in CBT therapy sessions.
Experimental results indicate that while LLMs perform well in reciting CBT knowledge, they fall short in complex real-world scenarios
arXiv Detail & Related papers (2024-10-17T04:52:57Z) - Evaluation of OpenAI o1: Opportunities and Challenges of AGI [112.0812059747033]
o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performance.
The model excelled in tasks requiring intricate reasoning and knowledge integration across various fields.
Overall results indicate significant progress towards artificial general intelligence.
arXiv Detail & Related papers (2024-09-27T06:57:00Z) - Cross-subject Brain Functional Connectivity Analysis for Multi-task Cognitive State Evaluation [16.198003101055264]
This study adopts brain functional connectivity with electroencephalography signals to capture associations in brain regions across multiple subjects for evaluating real-time cognitive states.
Thirty subjects are acquired for analysis and evaluation. The results are interpreted through different perspectives, including inner-subject and cross-subject for task-wise and gender-wise underlying brain functional connectivity.
arXiv Detail & Related papers (2024-08-27T12:51:59Z) - The virtual CAT: A tool for algorithmic thinking assessment in Swiss compulsory education [0.0]
This paper introduces the virtual Cross Array Task (CAT), a digital adaptation of an unplugged assessment activity designed to evaluate algorithmic skills in Swiss compulsory education.
The platform offers scalable and automated assessment, reducing human involvement and mitigating potential data collection errors.
The findings show the platform's usability, proficiency and suitability for assessing AT skills among students of diverse ages, development stages, and educational backgrounds.
arXiv Detail & Related papers (2024-08-02T13:36:17Z) - Evaluating Human-AI Collaboration: A Review and Methodological Framework [4.41358655687435]
The use of artificial intelligence (AI) in working environments with individuals, known as Human-AI Collaboration (HAIC), has become essential.
evaluating HAIC's effectiveness remains challenging due to the complex interaction of components involved.
This paper provides a detailed analysis of existing HAIC evaluation approaches and develops a fresh paradigm for more effectively evaluating these systems.
arXiv Detail & Related papers (2024-07-09T12:52:22Z) - Disentangling Heterogeneous Knowledge Concept Embedding for Cognitive Diagnosis on Untested Knowledge [24.363775475487117]
We propose a novel framework for Cognitive Diagnosis called Disentangling Heterogeneous Knowledge Cognitive Diagnosis(DisKCD)
We leverage course grades, exercise questions, and learning resources to learn the potential representations of students, exercises, and knowledge concepts.
We construct a heterogeneous relation graph network via students, exercises, tested knowledge concepts(TKCs), and UKCs.
arXiv Detail & Related papers (2024-05-25T01:49:54Z) - Survey of Computerized Adaptive Testing: A Machine Learning Perspective [66.26687542572974]
Computerized Adaptive Testing (CAT) provides an efficient and tailored method for assessing the proficiency of examinees.
This paper aims to provide a machine learning-focused survey on CAT, presenting a fresh perspective on this adaptive testing method.
arXiv Detail & Related papers (2024-03-31T15:09:47Z) - Multi-Dimensional Evaluation of Text Summarization with In-Context
Learning [79.02280189976562]
In this paper, we study the efficacy of large language models as multi-dimensional evaluators using in-context learning.
Our experiments show that in-context learning-based evaluators are competitive with learned evaluation frameworks for the task of text summarization.
We then analyze the effects of factors such as the selection and number of in-context examples on performance.
arXiv Detail & Related papers (2023-06-01T23:27:49Z) - UKP-SQuARE: An Interactive Tool for Teaching Question Answering [61.93372227117229]
The exponential growth of question answering (QA) has made it an indispensable topic in any Natural Language Processing (NLP) course.
We introduce UKP-SQuARE as a platform for QA education.
Students can run, compare, and analyze various QA models from different perspectives.
arXiv Detail & Related papers (2023-05-31T11:29:04Z) - The competent Computational Thinking test (cCTt): a valid, reliable and gender-fair test for longitudinal CT studies in grades 3-6 [0.06282171844772422]
This study investigated whether the competent Computational Thinking test (cCTt) could evaluate learning reliably from grades 3 to 6 (ages 7-11) using data from 2709 students.
The findings indicate that the cCTt is valid, reliable and gender-fair for grades 3-6, although more complex items would be beneficial for grades 5-6.
arXiv Detail & Related papers (2023-05-31T03:29:04Z) - Quiz-based Knowledge Tracing [61.9152637457605]
Knowledge tracing aims to assess individuals' evolving knowledge states according to their learning interactions.
QKT achieves state-of-the-art performance compared to existing methods.
arXiv Detail & Related papers (2023-04-05T12:48:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.