Related papers: Simulated Human Learning in a Dynamic, Partially-Observed, Time-Series Environment

Simulated Human Learning in a Dynamic, Partially-Observed, Time-Series Environment

URL: http://arxiv.org/abs/2511.15032v1
Date: Wed, 19 Nov 2025 01:57:52 GMT
Title: Simulated Human Learning in a Dynamic, Partially-Observed, Time-Series Environment
Authors: Jeffrey Jiang, Kevin Hong, Emily Kuczynski, Gregory Pottie,
Abstract summary: We develop a time-series environment to simulate a classroom setting, with student-teacher interventions.<n>We develop reinforcement learning ITSs that combine learning the individual state of students while pulling from population information.<n>We find that our policies are able to boost the performance of quiz and midterm structures more than we can in a finals-only structure.
Score: 1.3749490831384268
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While intelligent tutoring systems (ITSs) can use information from past students to personalize instruction, each new student is unique. Moreover, the education problem is inherently difficult because the learning process is only partially observable. We therefore develop a dynamic, time-series environment to simulate a classroom setting, with student-teacher interventions - including tutoring sessions, lectures, and exams. In particular, we design the simulated environment to allow for varying levels of probing interventions that can gather more information. Then, we develop reinforcement learning ITSs that combine learning the individual state of students while pulling from population information through the use of probing interventions. These interventions can reduce the difficulty of student estimation, but also introduce a cost-benefit decision to find a balance between probing enough to get accurate estimates and probing so often that it becomes disruptive to the student. We compare the efficacy of standard RL algorithms with several greedy rules-based heuristic approaches to find that they provide different solutions, but with similar results. We also highlight the difficulty of the problem with increasing levels of hidden information, and the boost that we get if we allow for probing interventions. We show the flexibility of both heuristic and RL policies with regards to changing student population distributions, finding that both are flexible, but RL policies struggle to help harder classes. Finally, we test different course structures with non-probing policies and we find that our policies are able to boost the performance of quiz and midterm structures more than we can in a finals-only structure, highlighting the benefit of having additional information.

Related papers

UCO: A Multi-Turn Interactive Reinforcement Learning Method for Adaptive Teaching with Large Language Models [59.693733170193944]
Large language models (LLMs) are shifting from answer providers to intelligent tutors in educational settings.<n>Recent reinforcement learning approaches address this limitation but face two critical challenges.<n>We propose the Unidirectional Cognitive Optimization (UCO) method to address these challenges.
arXiv Detail & Related papers (2025-11-12T01:27:02Z)
Who Is Lagging Behind: Profiling Student Behaviors with Graph-Level Encoding in Curriculum-Based Online Learning Systems [0.4775214751904462]
Student profiling is crucial for tracking progress, identifying struggling students, and alleviating disparities among students.<n>We introduce CTGraph, a graph-level repre- sentation learning approach to profile learner behaviors and performance in a self-supervised manner.<n>Our approach opens more opportunities to empower educators with rich insights into student learning journeys.
arXiv Detail & Related papers (2025-08-26T11:03:00Z)
Can Learned Optimization Make Reinforcement Learning Less Difficult? [70.5036361852812]
We consider whether learned optimization can help overcome reinforcement learning difficulties.<n>Our method, Learned Optimization for Plasticity, Exploration and Non-stationarity (OPEN), meta-learns an update rule whose input features and output structure are informed by previously proposed to these difficulties.
arXiv Detail & Related papers (2024-07-09T17:55:23Z)
Enhancing Student Performance Prediction on Learnersourced Questions with SGNN-LLM Synergy [11.735587384038753]
We introduce an innovative strategy that synergizes the potential of integrating Signed Graph Neural Networks (SGNNs) and Large Language Model (LLM) embeddings. Our methodology employs a signed bipartite graph to comprehensively model student answers, complemented by a contrastive learning framework that enhances noise resilience.
arXiv Detail & Related papers (2023-09-23T23:37:55Z)
Getting too personal(ized): The importance of feature choice in online adaptive algorithms [6.716421415117937]
We consider whether and when attempting to discover how to personalize has a cost, such as if the adaptation to personal information can delay the adoption of policies that benefit all students. We explore these issues in the context of using multi-armed bandit (MAB) algorithms to learn a policy for what version of an educational technology to present to each student. We demonstrate that the inclusion of student characteristics for personalization can be beneficial when those characteristics are needed to learn the optimal action.
arXiv Detail & Related papers (2023-09-06T09:34:54Z)
When Do Curricula Work in Federated Learning? [56.88941905240137]
We find that curriculum learning largely alleviates non-IIDness. The more disparate the data distributions across clients the more they benefit from learning. We propose a novel client selection technique that benefits from the real-world disparity in the clients.
arXiv Detail & Related papers (2022-12-24T11:02:35Z)
Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability. We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side. During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z)
Dynamic Diagnosis of the Progress and Shortcomings of Student Learning using Machine Learning based on Cognitive, Social, and Emotional Features [0.06999740786886534]
Student diversity can be challenging as it adds variability in the way in which students learn and progress over time. A single teaching approach is likely to be ineffective and result in students not meeting their potential. This paper discusses a novel methodology based on data analytics and Machine Learning to measure and causally diagnose the progress and shortcomings of student learning.
arXiv Detail & Related papers (2022-04-13T21:14:58Z)
Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world. Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts. This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
Mutual Information Based Knowledge Transfer Under State-Action Dimension Mismatch [14.334987432342707]
We propose a new framework for transfer learning where the teacher and the student can have arbitrarily different state- and action-spaces. To handle this mismatch, we produce embeddings which can systematically extract knowledge from the teacher policy and value networks. We demonstrate successful transfer learning in situations when the teacher and student have different state- and action-spaces.
arXiv Detail & Related papers (2020-06-12T09:51:17Z)
Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments. We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.