Toward Semi-Automatic Misconception Discovery Using Code Embeddings
- URL: http://arxiv.org/abs/2103.04448v1
- Date: Sun, 7 Mar 2021 20:32:41 GMT
- Title: Toward Semi-Automatic Misconception Discovery Using Code Embeddings
- Authors: Yang Shi, Krupal Shah, Wengran Wang, Samiha Marwan, Poorvaja Penmetsa
and Thomas W. Price
- Abstract summary: We present a novel method for the semi-automated discovery of problem-specific misconceptions from students' program code in computing courses.
We trained the model on a block-based programming dataset and used the learned embedding to cluster incorrect student submissions.
- Score: 4.369757255496184
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding students' misconceptions is important for effective teaching
and assessment. However, discovering such misconceptions manually can be
time-consuming and laborious. Automated misconception discovery can address
these challenges by highlighting patterns in student data, which domain experts
can then inspect to identify misconceptions. In this work, we present a novel
method for the semi-automated discovery of problem-specific misconceptions from
students' program code in computing courses, using a state-of-the-art code
classification model. We trained the model on a block-based programming dataset
and used the learned embedding to cluster incorrect student submissions. We
found these clusters correspond to specific misconceptions about the problem
and would not have been easily discovered with existing approaches. We also
discuss potential applications of our approach and how these misconceptions
inform domain-specific insights into students' learning processes.
Related papers
- RESTOR: Knowledge Recovery through Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can memorize undesirable datapoints.
Many machine unlearning methods have been proposed that aim to 'erase' these datapoints from trained models.
We propose the RESTOR framework for machine unlearning based on the following dimensions.
arXiv Detail & Related papers (2024-10-31T20:54:35Z) - LLM-based Cognitive Models of Students with Misconceptions [55.29525439159345]
This paper investigates whether Large Language Models (LLMs) can be instruction-tuned to meet this dual requirement.
We introduce MalAlgoPy, a novel Python library that generates datasets reflecting authentic student solution patterns.
Our insights enhance our understanding of AI-based student models and pave the way for effective adaptive learning systems.
arXiv Detail & Related papers (2024-10-16T06:51:09Z) - Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing [59.480951050911436]
We present KCQRL, a framework for automated knowledge concept annotation and question representation learning.
We demonstrate the effectiveness of KCQRL across 15 KT algorithms on two large real-world Math learning datasets.
arXiv Detail & Related papers (2024-10-02T16:37:19Z) - Counterfactual Explanations for Clustering Models [11.40145394568897]
Clustering algorithms rely on complex optimisation processes that may be difficult to comprehend.
We propose a new, model-agnostic technique for explaining clustering algorithms with counterfactual statements.
arXiv Detail & Related papers (2024-09-19T10:05:58Z) - An Approach to Detect Abnormal Submissions for CodeWorkout Dataset [8.142354661558752]
This paper presents a preliminary study to analyze log data with anomalies.
The goal of our work is to overcome the abnormal instances when modeling personalizable recommendations in programming learning environments.
arXiv Detail & Related papers (2024-06-28T00:26:15Z) - Creating a Trajectory for Code Writing: Algorithmic Reasoning Tasks [0.923607423080658]
This paper describes instruments and the machine learning models used for validating them.
We have used the data collected in an introductory programming course in the penultimate week of the semester.
Preliminary research suggests ART type instruments can be combined with specific machine learning models to act as an effective learning trajectory.
arXiv Detail & Related papers (2024-04-03T05:07:01Z) - Automatic Classification of Error Types in Solutions to Programming
Assignments at Online Learning Platform [4.028503203417233]
We apply machine learning methods to improve the feedback of automated verification systems for programming assignments.
We detect frequent error types by clustering previously submitted incorrect solutions, label these clusters and use this labeled dataset to identify the type of an error in a new submission.
arXiv Detail & Related papers (2021-07-13T11:59:57Z) - Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training.
At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z) - Knowledge as Invariance -- History and Perspectives of
Knowledge-augmented Machine Learning [69.99522650448213]
Research in machine learning is at a turning point.
Research interests are shifting away from increasing the performance of highly parameterized models to exceedingly specific tasks.
This white paper provides an introduction and discussion of this emerging field in machine learning research.
arXiv Detail & Related papers (2020-12-21T15:07:19Z) - A Survey of Machine Learning Methods and Challenges for Windows Malware
Classification [43.4550536920809]
Survey aims to be useful both to cybersecurity practitioners who wish to learn more about how machine learning can be applied to the malware problem, and to give data scientists the necessary background into the challenges in this uniquely complicated space.
arXiv Detail & Related papers (2020-06-15T17:46:12Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.