Automated Knowledge Component Generation and Knowledge Tracing for Coding Problems
- URL: http://arxiv.org/abs/2502.18632v1
- Date: Tue, 25 Feb 2025 20:40:51 GMT
- Title: Automated Knowledge Component Generation and Knowledge Tracing for Coding Problems
- Authors: Zhangqi Duan, Nigel Fernandez, Sri Kanakadandi, Bita Akram, Andrew Lan,
- Abstract summary: Knowledge components (KCs) mapped to problems help model student learning, tracking their mastery levels on fine-grained skills.<n>We present a fully automated, LLM-based pipeline for KC generation and tagging for open-ended programming problems.<n>We conduct a human evaluation to show that the KC tagging accuracy of our pipeline is reasonably accurate when compared to that by human domain experts.
- Score: 2.1464087136305774
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge components (KCs) mapped to problems help model student learning, tracking their mastery levels on fine-grained skills thereby facilitating personalized learning and feedback in online learning platforms. However, crafting and tagging KCs to problems, traditionally performed by human domain experts, is highly labor-intensive. We present a fully automated, LLM-based pipeline for KC generation and tagging for open-ended programming problems. We also develop an LLM-based knowledge tracing (KT) framework to leverage these LLM-generated KCs, which we refer to as KCGen-KT. We conduct extensive quantitative and qualitative evaluations validating the effectiveness of KCGen-KT. On a real-world dataset of student code submissions to open-ended programming problems, KCGen-KT outperforms existing KT methods. We investigate the learning curves of generated KCs and show that LLM-generated KCs have a comparable level-of-fit to human-written KCs under the performance factor analysis (PFA) model. We also conduct a human evaluation to show that the KC tagging accuracy of our pipeline is reasonably accurate when compared to that by human domain experts.
Related papers
- Sparse Binary Representation Learning for Knowledge Tracing [0.0]
Knowledge tracing (KT) models aim to predict students' future performance based on their historical interactions.<n>Most existing KT models rely exclusively on human-defined knowledge concepts associated with exercises.<n>We propose a KT model, Sparse Binary Representation KT (SBRKT), that generates new KC labels, referred to as auxiliary KCs.
arXiv Detail & Related papers (2025-01-17T00:45:10Z) - KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [73.34893326181046]
Large language models (LLMs) usually rely on retrieval-augmented generation to exploit knowledge materials in an instant manner.<n>We propose KBAlign, an approach designed for efficient adaptation to downstream tasks involving knowledge bases.<n>Our method utilizes iterative training with self-annotated data such as Q&A pairs and revision suggestions, enabling the model to grasp the knowledge content efficiently.
arXiv Detail & Related papers (2024-11-22T08:21:03Z) - Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing [59.480951050911436]
We present KCQRL, a framework for automated knowledge concept annotation and question representation learning.
We demonstrate the effectiveness of KCQRL across 15 KT algorithms on two large real-world Math learning datasets.
arXiv Detail & Related papers (2024-10-02T16:37:19Z) - Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs [55.317267269115845]
Chain-of-Knowledge (CoK) is a comprehensive framework for knowledge reasoning.
CoK includes methodologies for both dataset construction and model learning.
We conduct extensive experiments with KnowReason.
arXiv Detail & Related papers (2024-06-30T10:49:32Z) - Automated Generation and Tagging of Knowledge Components from Multiple-Choice Questions [2.6644846626273457]
We employ GPT-4 to generate KCs for multiple-choice questions (MCQs) in Chemistry and E-Learning.
We analyzed discrepancies between the KCs generated by the Large Language Model (LLM) and those made by humans.
We also developed an induction algorithm to cluster questions that assess similar KCs based on their content.
arXiv Detail & Related papers (2024-05-30T22:57:49Z) - From human experts to machines: An LLM supported approach to ontology
and knowledge graph construction [0.0]
Large Language Models (LLMs) have recently gained popularity for their ability to understand and generate human-like natural language.
This work explores the (semi-)automatic construction of KGs facilitated by open-source LLMs.
arXiv Detail & Related papers (2024-03-13T08:50:15Z) - A Survey on Knowledge Distillation of Large Language Models [99.11900233108487]
Knowledge Distillation (KD) emerges as a pivotal methodology for transferring advanced capabilities to open-source models.
This paper presents a comprehensive survey of KD's role within the realm of Large Language Models (LLMs)
arXiv Detail & Related papers (2024-02-20T16:17:37Z) - Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for
Knowledge-intensive Question Answering [17.672572064705445]
Large language models (LLMs) equipped with Chain-of-Thought (CoT) have shown impressive reasoning ability in various downstream tasks.
We propose a framework called Knowledge-Driven Chain-of-Thought (KD-CoT) to verify and modify reasoning traces in CoT via interaction with external knowledge.
arXiv Detail & Related papers (2023-08-25T09:23:55Z) - KoLA: Carefully Benchmarking World Knowledge of Large Language Models [87.96683299084788]
We construct a Knowledge-oriented LLM Assessment benchmark (KoLA)
We mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks.
We use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, to evaluate the capacity to handle unseen data and evolving knowledge.
arXiv Detail & Related papers (2023-06-15T17:20:46Z) - Using Representation Expressiveness and Learnability to Evaluate
Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability.
CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means.
We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.