Related papers: Automated Knowledge Component Generation and Knowledge Tracing for Coding Problems

Automated Knowledge Component Generation and Knowledge Tracing for Coding Problems

URL: http://arxiv.org/abs/2502.18632v1
Date: Tue, 25 Feb 2025 20:40:51 GMT
Title: Automated Knowledge Component Generation and Knowledge Tracing for Coding Problems
Authors: Zhangqi Duan, Nigel Fernandez, Sri Kanakadandi, Bita Akram, Andrew Lan,
Abstract summary: Knowledge components (KCs) mapped to problems help model student learning, tracking their mastery levels on fine-grained skills.<n>We present a fully automated, LLM-based pipeline for KC generation and tagging for open-ended programming problems.<n>We conduct a human evaluation to show that the KC tagging accuracy of our pipeline is reasonably accurate when compared to that by human domain experts.
Score: 2.1464087136305774
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Knowledge components (KCs) mapped to problems help model student learning, tracking their mastery levels on fine-grained skills thereby facilitating personalized learning and feedback in online learning platforms. However, crafting and tagging KCs to problems, traditionally performed by human domain experts, is highly labor-intensive. We present a fully automated, LLM-based pipeline for KC generation and tagging for open-ended programming problems. We also develop an LLM-based knowledge tracing (KT) framework to leverage these LLM-generated KCs, which we refer to as KCGen-KT. We conduct extensive quantitative and qualitative evaluations validating the effectiveness of KCGen-KT. On a real-world dataset of student code submissions to open-ended programming problems, KCGen-KT outperforms existing KT methods. We investigate the learning curves of generated KCs and show that LLM-generated KCs have a comparable level-of-fit to human-written KCs under the performance factor analysis (PFA) model. We also conduct a human evaluation to show that the KC tagging accuracy of our pipeline is reasonably accurate when compared to that by human domain experts.

Related papers

MAS-KCL: Knowledge component graph structure learning with large language model-based agentic workflow [12.083628171166733]
An accurate KC graph can assist educators in identifying the root causes of learners' poor performance on specific KCs.<n>We have developed a KC graph structure learning algorithm, named MAS-KCL, which employs a multi-agent system driven by large language models.
arXiv Detail & Related papers (2025-05-20T09:32:47Z)
KCluster: An LLM-based Clustering Approach to Knowledge Component Discovery [0.26626950367610397]
We propose KCluster, a novel KC discovery algorithm based on identifying clusters of congruent questions.<n>We demonstrate in three datasets that an LLM can create an effective metric of question similarity.<n>KCluster generates descriptive KC labels and discovers KC models that predict student performance better than the best expert-designed models.
arXiv Detail & Related papers (2025-05-09T23:47:58Z)
Sparse Binary Representation Learning for Knowledge Tracing [0.0]
Knowledge tracing (KT) models aim to predict students' future performance based on their historical interactions.<n>Most existing KT models rely exclusively on human-defined knowledge concepts associated with exercises.<n>We propose a KT model, Sparse Binary Representation KT (SBRKT), that generates new KC labels, referred to as auxiliary KCs.
arXiv Detail & Related papers (2025-01-17T00:45:10Z)
KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [73.34893326181046]
Large language models (LLMs) usually rely on retrieval-augmented generation to exploit knowledge materials in an instant manner.<n>We propose KBAlign, an approach designed for efficient adaptation to downstream tasks involving knowledge bases.<n>Our method utilizes iterative training with self-annotated data such as Q&A pairs and revision suggestions, enabling the model to grasp the knowledge content efficiently.
arXiv Detail & Related papers (2024-11-22T08:21:03Z)
Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing [59.480951050911436]
We present KCQRL, a framework for automated knowledge concept annotation and question representation learning. We demonstrate the effectiveness of KCQRL across 15 KT algorithms on two large real-world Math learning datasets.
arXiv Detail & Related papers (2024-10-02T16:37:19Z)
Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs [55.317267269115845]
Chain-of-Knowledge (CoK) is a comprehensive framework for knowledge reasoning. CoK includes methodologies for both dataset construction and model learning. We conduct extensive experiments with KnowReason.
arXiv Detail & Related papers (2024-06-30T10:49:32Z)
Automated Generation and Tagging of Knowledge Components from Multiple-Choice Questions [2.6644846626273457]
We employ GPT-4 to generate KCs for multiple-choice questions (MCQs) in Chemistry and E-Learning. We analyzed discrepancies between the KCs generated by the Large Language Model (LLM) and those made by humans. We also developed an induction algorithm to cluster questions that assess similar KCs based on their content.
arXiv Detail & Related papers (2024-05-30T22:57:49Z)
From human experts to machines: An LLM supported approach to ontology and knowledge graph construction [0.0]
Large Language Models (LLMs) have recently gained popularity for their ability to understand and generate human-like natural language. This work explores the (semi-)automatic construction of KGs facilitated by open-source LLMs.
arXiv Detail & Related papers (2024-03-13T08:50:15Z)
A Knowledge-Injected Curriculum Pretraining Framework for Question Answering [70.13026036388794]
We propose a general Knowledge-Injected Curriculum Pretraining framework (KICP) to achieve comprehensive KG learning and exploitation for Knowledge-based question answering tasks. The KI module first injects knowledge into the LM by generating KG-centered pretraining corpus, and generalizes the process into three key steps. The KA module learns knowledge from the generated corpus with LM equipped with an adapter as well as keeps its original natural language understanding ability. The CR module follows human reasoning patterns to construct three corpora with increasing difficulties of reasoning, and further trains the LM from easy to hard in a curriculum manner.
arXiv Detail & Related papers (2024-03-11T03:42:03Z)
A Survey on Knowledge Distillation of Large Language Models [99.11900233108487]
Knowledge Distillation (KD) emerges as a pivotal methodology for transferring advanced capabilities to open-source models. This paper presents a comprehensive survey of KD's role within the realm of Large Language Models (LLMs)
arXiv Detail & Related papers (2024-02-20T16:17:37Z)
KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection [48.067722381794]
Large Language Models (LLMs) have demonstrated remarkable human-level natural language generation capabilities. Their potential to generate misinformation, often called the hallucination problem, poses a significant risk to their deployment. We propose a knowledge-constrained decoding method called KCTS, which guides a frozen LM to generate text aligned with the reference knowledge.
arXiv Detail & Related papers (2023-10-13T12:12:34Z)
Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering [17.672572064705445]
Large language models (LLMs) equipped with Chain-of-Thought (CoT) have shown impressive reasoning ability in various downstream tasks. We propose a framework called Knowledge-Driven Chain-of-Thought (KD-CoT) to verify and modify reasoning traces in CoT via interaction with external knowledge.
arXiv Detail & Related papers (2023-08-25T09:23:55Z)
KoLA: Carefully Benchmarking World Knowledge of Large Language Models [87.96683299084788]
We construct a Knowledge-oriented LLM Assessment benchmark (KoLA) We mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks. We use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, to evaluate the capacity to handle unseen data and evolving knowledge.
arXiv Detail & Related papers (2023-06-15T17:20:46Z)
Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability. CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means. We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z)
Knowledgeable Salient Span Mask for Enhancing Language Models as Knowledge Base [51.55027623439027]
We develop two solutions to help the model learn more knowledge from unstructured text in a fully self-supervised manner. To our best knowledge, we are the first to explore fully self-supervised learning of knowledge in continual pre-training.
arXiv Detail & Related papers (2022-04-17T12:33:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.