Embedding-Based Rankings of Educational Resources based on Learning Outcome Alignment: Benchmarking, Expert Validation, and Learner Performance
- URL: http://arxiv.org/abs/2512.13658v1
- Date: Mon, 15 Dec 2025 18:51:00 GMT
- Title: Embedding-Based Rankings of Educational Resources based on Learning Outcome Alignment: Benchmarking, Expert Validation, and Learner Performance
- Authors: Mohammadreza Molavi, Mohammad Moein, Mohammadreza Tavakoli, Abdolali Faraji, Stefan T. Mol, Gábor Kismihók,
- Abstract summary: Large Language Models (LLMs) are attracting growing interest for their potential to create learning resources that better support personalization.<n>We propose a framework that supports the cost-effective automation of evaluating alignment between educational resources and intended learning outcomes.
- Score: 0.9236074230806578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the online learning landscape evolves, the need for personalization is increasingly evident. Although educational resources are burgeoning, educators face challenges selecting materials that both align with intended learning outcomes and address diverse learner needs. Large Language Models (LLMs) are attracting growing interest for their potential to create learning resources that better support personalization, but verifying coverage of intended outcomes still requires human alignment review, which is costly and limits scalability. We propose a framework that supports the cost-effective automation of evaluating alignment between educational resources and intended learning outcomes. Using human-generated materials, we benchmarked LLM-based text-embedding models and found that the most accurate model (Voyage) achieved 79% accuracy in detecting alignment. We then applied the optimal model to LLM-generated resources and, via expert evaluation, confirmed that it reliably assessed correspondence to intended outcomes (83% accuracy). Finally, in a three-group experiment with 360 learners, higher alignment scores were positively related to greater learning performance, chi-squared(2, N = 360) = 15.39, p < 0.001. These findings show that embedding-based alignment scores can facilitate scalable personalization by confirming alignment with learning outcomes, which allows teachers to focus on tailoring content to diverse learner needs.
Related papers
- TutorBench: A Benchmark To Assess Tutoring Capabilities Of Large Language Models [10.963195858672627]
TutorBench is a dataset and evaluation benchmark designed to rigorously evaluate the core tutoring skills of large language models (LLMs)<n>Samples are drawn from three common tutoring tasks: (i) generating adaptive explanations tailored to a student's confusion, (ii) providing actionable feedback on a student's work, and (iii) promoting active learning through effective hint generation.<n>We evaluate 16 frontier LLMs on TutorBench and present a detailed analysis of their performance and behavior.
arXiv Detail & Related papers (2025-10-03T01:41:09Z) - Benchmarking Large Language Models for Personalized Guidance in AI-Enhanced Learning [4.990353320509215]
Large Language Models (LLMs) are increasingly envisioned as intelligent assistants for personalized learning.<n>This study presents an empirical comparison of three state-of-the-art LLMs on a tutoring task simulating a realistic learning setting.
arXiv Detail & Related papers (2025-09-02T14:21:59Z) - Cultivating Helpful, Personalized, and Creative AI Tutors: A Framework for Pedagogical Alignment using Reinforcement Learning [17.558663729465692]
EduAlign is a framework designed to guide large language models (LLMs) toward becoming more effective and responsible educational assistants.<n>In the first stage, we curate a dataset of 8k educational interactions and annotate them-both manually and automatically-along three key educational dimensions: Helpfulness, Personalization, and Creativity.<n>In the second stage, we leverage HPC-RM as a reward signal to fine-tune a pre-trained LLM using Group Relative Policy Optimization (GRPO) on a set of 2k diverse prompts.
arXiv Detail & Related papers (2025-07-27T15:56:29Z) - Adaptive Learning Systems: Personalized Curriculum Design Using LLM-Powered Analytics [14.157213827899342]
Large language models (LLMs) are revolutionizing the field of education by enabling personalized learning experiences tailored to individual student needs.<n>This paper introduces a framework for Adaptive Learning Systems that leverages LLM-powered analytics for personalized curriculum design.
arXiv Detail & Related papers (2025-07-25T04:36:17Z) - From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning [82.50157695987558]
Large language models (LLMs) can transform education, but their optimization for direct question-answering often undermines effective pedagogy.<n>We propose an online reinforcement learning (RL)-based alignment framework that can quickly adapt LLMs into effective tutors.
arXiv Detail & Related papers (2025-05-21T15:00:07Z) - LLM-powered Multi-agent Framework for Goal-oriented Learning in Intelligent Tutoring System [54.71619734800526]
GenMentor is a multi-agent framework designed to deliver goal-oriented, personalized learning within ITS.<n>It maps learners' goals to required skills using a fine-tuned LLM trained on a custom goal-to-skill dataset.<n>GenMentor tailors learning content with an exploration-drafting-integration mechanism to align with individual learner needs.
arXiv Detail & Related papers (2025-01-27T03:29:44Z) - KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [73.34893326181046]
We present KBAlign, a self-supervised framework that enhances RAG systems through efficient model adaptation.<n>Our key insight is to leverage the model's intrinsic capabilities for knowledge alignment through two innovative mechanisms.<n> Experiments demonstrate that KBAlign can achieve 90% of the performance gain obtained through GPT-4-supervised adaptation.
arXiv Detail & Related papers (2024-11-22T08:21:03Z) - Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales.
We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z) - Evaluating and Optimizing Educational Content with Large Language Model Judgments [52.33701672559594]
We use Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes.
We introduce an instruction optimization approach in which one LM generates instructional materials using the judgments of another LM as a reward function.
Human teachers' evaluations of these LM-generated worksheets show a significant alignment between the LM judgments and human teacher preferences.
arXiv Detail & Related papers (2024-03-05T09:09:15Z) - QuRating: Selecting High-Quality Data for Training Language Models [64.83332850645074]
We introduce QuRating, a method for selecting pre-training data that can capture human intuitions about data quality.
In this paper, we investigate four qualities - writing style, required expertise, facts & trivia, and educational value.
We train a Qur model to learn scalar ratings from pairwise judgments, and use it to annotate a 260B training corpus with quality ratings for each of the four criteria.
arXiv Detail & Related papers (2024-02-15T06:36:07Z) - BAL: Balancing Diversity and Novelty for Active Learning [53.289700543331925]
We introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data.
Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%.
arXiv Detail & Related papers (2023-12-26T08:14:46Z) - Can We Trust AI-Generated Educational Content? Comparative Analysis of
Human and AI-Generated Learning Resources [4.528957284486784]
Large language models (LLMs) appear to offer a promising solution to the rapid creation of learning materials at scale.
We compare the quality of resources generated by an LLM with those created by students as part of a learnersourcing activity.
Our results show that the quality of AI-generated resources, as perceived by students, is equivalent to the quality of resources generated by their peers.
arXiv Detail & Related papers (2023-06-18T09:49:21Z) - Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability.
We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side.
During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.