KCluster: An LLM-based Clustering Approach to Knowledge Component Discovery
- URL: http://arxiv.org/abs/2505.06469v1
- Date: Fri, 09 May 2025 23:47:58 GMT
- Title: KCluster: An LLM-based Clustering Approach to Knowledge Component Discovery
- Authors: Yumou Wei, Paulo Carvalho, John Stamper,
- Abstract summary: We propose KCluster, a novel KC discovery algorithm based on identifying clusters of congruent questions.<n>We demonstrate in three datasets that an LLM can create an effective metric of question similarity.<n>KCluster generates descriptive KC labels and discovers KC models that predict student performance better than the best expert-designed models.
- Score: 0.26626950367610397
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Educators evaluate student knowledge using knowledge component (KC) models that map assessment questions to KCs. Still, designing KC models for large question banks remains an insurmountable challenge for instructors who need to analyze each question by hand. The growing use of Generative AI in education is expected only to aggravate this chronic deficiency of expert-designed KC models, as course engineers designing KCs struggle to keep up with the pace at which questions are generated. In this work, we propose KCluster, a novel KC discovery algorithm based on identifying clusters of congruent questions according to a new similarity metric induced by a large language model (LLM). We demonstrate in three datasets that an LLM can create an effective metric of question similarity, which a clustering algorithm can use to create KC models from questions with minimal human effort. Combining the strengths of LLM and clustering, KCluster generates descriptive KC labels and discovers KC models that predict student performance better than the best expert-designed models available. In anticipation of future work, we illustrate how KCluster can reveal insights into difficult KCs and suggest improvements to instruction.
Related papers
- MAS-KCL: Knowledge component graph structure learning with large language model-based agentic workflow [12.083628171166733]
An accurate KC graph can assist educators in identifying the root causes of learners' poor performance on specific KCs.<n>We have developed a KC graph structure learning algorithm, named MAS-KCL, which employs a multi-agent system driven by large language models.
arXiv Detail & Related papers (2025-05-20T09:32:47Z) - Automated Knowledge Component Generation and Knowledge Tracing for Coding Problems [2.1464087136305774]
Knowledge components (KCs) mapped to problems help model student learning, tracking their mastery levels on fine-grained skills.<n>We present a fully automated, LLM-based pipeline for KC generation and tagging for open-ended programming problems.<n>We conduct a human evaluation to show that the KC tagging accuracy of our pipeline is reasonably accurate when compared to that by human domain experts.
arXiv Detail & Related papers (2025-02-25T20:40:51Z) - Sparse Binary Representation Learning for Knowledge Tracing [0.0]
Knowledge tracing (KT) models aim to predict students' future performance based on their historical interactions.<n>Most existing KT models rely exclusively on human-defined knowledge concepts associated with exercises.<n>We propose a KT model, Sparse Binary Representation KT (SBRKT), that generates new KC labels, referred to as auxiliary KCs.
arXiv Detail & Related papers (2025-01-17T00:45:10Z) - Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing [59.480951050911436]
We present KCQRL, a framework for automated knowledge concept annotation and question representation learning.<n>We demonstrate the effectiveness of KCQRL across 15 KT algorithms on two large real-world Math learning datasets.
arXiv Detail & Related papers (2024-10-02T16:37:19Z) - Automated Generation and Tagging of Knowledge Components from Multiple-Choice Questions [2.6644846626273457]
We employ GPT-4 to generate KCs for multiple-choice questions (MCQs) in Chemistry and E-Learning.
We analyzed discrepancies between the KCs generated by the Large Language Model (LLM) and those made by humans.
We also developed an induction algorithm to cluster questions that assess similar KCs based on their content.
arXiv Detail & Related papers (2024-05-30T22:57:49Z) - A Knowledge-Injected Curriculum Pretraining Framework for Question Answering [70.13026036388794]
We propose a general Knowledge-Injected Curriculum Pretraining framework (KICP) to achieve comprehensive KG learning and exploitation for Knowledge-based question answering tasks.
The KI module first injects knowledge into the LM by generating KG-centered pretraining corpus, and generalizes the process into three key steps.
The KA module learns knowledge from the generated corpus with LM equipped with an adapter as well as keeps its original natural language understanding ability.
The CR module follows human reasoning patterns to construct three corpora with increasing difficulties of reasoning, and further trains the LM from easy to hard in a curriculum manner.
arXiv Detail & Related papers (2024-03-11T03:42:03Z) - KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level
Hallucination Detection [48.067722381794]
Large Language Models (LLMs) have demonstrated remarkable human-level natural language generation capabilities.
Their potential to generate misinformation, often called the hallucination problem, poses a significant risk to their deployment.
We propose a knowledge-constrained decoding method called KCTS, which guides a frozen LM to generate text aligned with the reference knowledge.
arXiv Detail & Related papers (2023-10-13T12:12:34Z) - Towards Understanding Mixture of Experts in Deep Learning [95.27215939891511]
We study how the MoE layer improves the performance of neural network learning.
Our results suggest that the cluster structure of the underlying problem and the non-linearity of the expert are pivotal to the success of MoE.
arXiv Detail & Related papers (2022-08-04T17:59:10Z) - Using Representation Expressiveness and Learnability to Evaluate
Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability.
CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means.
We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z) - KACC: A Multi-task Benchmark for Knowledge Abstraction, Concretization
and Completion [99.47414073164656]
A comprehensive knowledge graph (KG) contains an instance-level entity graph and an ontology-level concept graph.
The two-view KG provides a testbed for models to "simulate" human's abilities on knowledge abstraction, concretization, and completion.
We propose a unified KG benchmark by improving existing benchmarks in terms of dataset scale, task coverage, and difficulty.
arXiv Detail & Related papers (2020-04-28T16:21:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.