Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback
- URL: http://arxiv.org/abs/2603.04029v1
- Date: Wed, 04 Mar 2026 13:07:42 GMT
- Title: Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback
- Authors: Fabian Domberg, Georg Schildbach,
- Abstract summary: This work presents a framework for online Continual Reinforcement Learning that enables automated adaptation during deployment.<n>The proposed method leverages world model prediction residuals to detect out-of-distribution events and automatically trigger finetuning.<n>The approach is validated on a variety of contemporary continuous control problems, including a quadruped robot in high-fidelity simulation.
- Score: 2.165723322157105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As learning-based robotic controllers are typically trained offline and deployed with fixed parameters, their ability to cope with unforeseen changes during operation is limited. Biologically inspired, this work presents a framework for online Continual Reinforcement Learning that enables automated adaptation during deployment. Building on DreamerV3, a model-based Reinforcement Learning algorithm, the proposed method leverages world model prediction residuals to detect out-of-distribution events and automatically trigger finetuning. Adaptation progress is monitored using both task-level performance signals and internal training metrics, allowing convergence to be assessed without external supervision and domain knowledge. The approach is validated on a variety of contemporary continuous control problems, including a quadruped robot in high-fidelity simulation, and a real-world model vehicle. Relevant metrics and their interpretation are presented and discussed, as well as resulting trade-offs described. The results sketch out how autonomous robotic agents could once move beyond static training regimes toward adaptive systems capable of self-reflection and -improvement during operation, just like their biological counterparts.
Related papers
- From Physics to Machine Learning and Back: Part II - Learning and Observational Bias in PHM [52.64097278841485]
Review examines how incorporating learning and observational biases through physics-informed modeling and data strategies can guide models toward physically consistent and reliable predictions.<n>Fast adaptation methods including meta-learning and few-shot learning are reviewed alongside domain generalization techniques.
arXiv Detail & Related papers (2025-09-25T14:15:43Z) - Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving [55.13109926181247]
We introduce ReflectDrive, a learning-based framework that integrates a reflection mechanism for safe trajectory generation via discrete diffusion.<n>Central to our approach is a safety-aware reflection mechanism that performs iterative self-correction without gradient.<n>Our method begins with goal-conditioned trajectory generation to model multi-modal driving behaviors.
arXiv Detail & Related papers (2025-09-24T13:35:15Z) - World Models for Anomaly Detection during Model-Based Reinforcement Learning Inference [3.591122855617648]
Learning-based controllers are often purposefully kept out of real-world applications due to concerns about their safety and reliability.<n>We explore how state-of-the-art world models in Model-Based Reinforcement Learning can be utilized beyond the training phase to ensure a deployed policy only operates within regions of the state-space it is sufficiently familiar with.
arXiv Detail & Related papers (2025-03-04T12:25:01Z) - Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [50.191655141020505]
This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer.<n>By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.
arXiv Detail & Related papers (2025-01-17T10:39:09Z) - AdaCropFollow: Self-Supervised Online Adaptation for Visual Under-Canopy Navigation [31.214318150001947]
Under-canopy agricultural robots can enable various applications like precise monitoring, spraying, weeding, and plant manipulation tasks.
We propose a self-supervised online adaptation method for adapting the semantic keypoint representation using a visual foundational model, geometric prior, and pseudo labeling.
This can enable fully autonomous row-following capability in under-canopy robots across fields and crops without requiring human intervention.
arXiv Detail & Related papers (2024-10-16T09:52:38Z) - Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models [60.87795376541144]
A world model is a neural network capable of predicting an agent's next state given past states and actions.<n>During end-to-end training, our policy learns how to recover from errors by aligning with states observed in human demonstrations.<n>We present qualitative and quantitative results, demonstrating significant improvements upon prior state of the art in closed-loop testing.
arXiv Detail & Related papers (2024-09-25T06:48:25Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - DiAReL: Reinforcement Learning with Disturbance Awareness for Robust
Sim2Real Policy Transfer in Robot Control [0.0]
Delayed Markov decision processes fulfill the Markov property by augmenting the state space of agents with a finite time window of recently committed actions.
We introduce a disturbance-augmented Markov decision process in delayed settings as a novel representation to incorporate disturbance estimation in training on-policy reinforcement learning algorithms.
arXiv Detail & Related papers (2023-06-15T10:11:38Z) - Model-Based Reinforcement Learning with Isolated Imaginations [61.67183143982074]
We propose Iso-Dream++, a model-based reinforcement learning approach.
We perform policy optimization based on the decoupled latent imaginations.
This enables long-horizon visuomotor control tasks to benefit from isolating mixed dynamics sources in the wild.
arXiv Detail & Related papers (2023-03-27T02:55:56Z) - Active Learning of Discrete-Time Dynamics for Uncertainty-Aware Model Predictive Control [46.81433026280051]
We present a self-supervised learning approach that actively models the dynamics of nonlinear robotic systems.
Our approach showcases high resilience and generalization capabilities by consistently adapting to unseen flight conditions.
arXiv Detail & Related papers (2022-10-23T00:45:05Z) - Dream to Explore: Adaptive Simulations for Autonomous Systems [3.0664963196464448]
We tackle the problem of learning to control dynamical systems by applying Bayesian nonparametric methods.
By employing Gaussian processes to discover latent world dynamics, we mitigate common data efficiency issues observed in reinforcement learning.
Our algorithm jointly learns a world model and policy by optimizing a variational lower bound of a log-likelihood.
arXiv Detail & Related papers (2021-10-27T04:27:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.