Related papers: Online Adaptive Reinforcement Learning with Echo State Networks for Non-Stationary Dynamics

Online Adaptive Reinforcement Learning with Echo State Networks for Non-Stationary Dynamics

URL: http://arxiv.org/abs/2602.06326v1
Date: Fri, 06 Feb 2026 02:51:01 GMT
Title: Online Adaptive Reinforcement Learning with Echo State Networks for Non-Stationary Dynamics
Authors: Aoi Yoshimura, Gouhei Tanaka,
Abstract summary: In this paper, we propose a lightweight online adaptation framework forReinforcement learning (RL) based on Reservoir Computing.<n> Specifically, we integrate an Echo State Networks (ESNs) as an adaptation module that encodes recent observation histories into a latent context representation.<n>We evaluate the proposed method on CartPole and HalfCheetah tasks with severe and abrupt environment changes.
Score: 0.5745796568988237
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) policies trained in simulation often suffer from severe performance degradation when deployed in real-world environments due to non-stationary dynamics. While Domain Randomization (DR) and meta-RL have been proposed to address this issue, they typically rely on extensive pretraining, privileged information, or high computational cost, limiting their applicability to real-time and edge systems. In this paper, we propose a lightweight online adaptation framework for RL based on Reservoir Computing. Specifically, we integrate an Echo State Networks (ESNs) as an adaptation module that encodes recent observation histories into a latent context representation, and update its readout weights online using Recursive Least Squares (RLS). This design enables rapid adaptation without backpropagation, pretraining, or access to privileged information. We evaluate the proposed method on CartPole and HalfCheetah tasks with severe and abrupt environment changes, including periodic external disturbances and extreme friction variations. Experimental results demonstrate that the proposed approach significantly outperforms DR and representative adaptive baselines under out-of-distribution dynamics, achieving stable adaptation within a few control steps. Notably, the method successfully handles intra-episode environment changes without resetting the policy. Due to its computational efficiency and stability, the proposed framework provides a practical solution for online adaptation in non-stationary environments and is well suited for real-world robotic control and edge deployment.

Related papers

When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training [58.25341036646294]
We analytically examine why learning recurrent poles does not provide tangible benefits in data and empirically offer real-time learning scenarios.<n>We show that fixed-pole networks achieve superior performance with lower training complexity, making them more suitable for online real-time tasks.
arXiv Detail & Related papers (2026-02-25T00:15:13Z)
SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning [24.80806018678682]
Reinforcement learning (RL) offers a principled way to enhance the reasoning capabilities of large language models.<n>In practice, RL progress often slows when task difficulty becomes poorly aligned with model capability.<n>We propose a framework that sustains effective learning signals through adaptive environment design.
arXiv Detail & Related papers (2026-01-08T10:42:04Z)
TS-DP: Reinforcement Speculative Decoding For Temporal Adaptive Diffusion Policy Acceleration [64.32072516882947]
Diffusion Policy excels in embodied control but suffers from high inference latency and computational cost.<n>We propose Temporal-aware Reinforcement-based Speculative Diffusion Policy (TS-DP)<n>TS-DP achieves up to 4.17 times faster inference with over 94% accepted drafts, reaching an inference frequency of 25 Hz.
arXiv Detail & Related papers (2025-12-13T07:53:14Z)
Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning [77.92320830700797]
Reinforcement Learning has played a central role in enabling reasoning capabilities of Large Language Models.<n>We propose a tractable computational framework that tracks and leverages curvature information during policy updates.<n>The algorithm, Curvature-Aware Policy Optimization (CAPO), identifies samples that contribute to unstable updates and masks them out.
arXiv Detail & Related papers (2025-10-01T12:29:32Z)
RISE: Robust Imitation through Stochastic Encoding [0.764671395172401]
We propose a novel imitation-learning framework that explicitly addresses erroneous measurements of environment parameters into policy learning.<n>Our framework encodes parameters such as obstacle state, orientation, and velocity into a latent space to improve test time.<n>We validate our approach on two robotic platforms and demonstrate improved safety while maintaining goal-reaching performance compared to baseline methods.
arXiv Detail & Related papers (2025-03-15T19:52:16Z)
Large Language Model driven Policy Exploration for Recommender Systems [50.70228564385797]
offline RL policies trained on static user data are vulnerable to distribution shift when deployed in dynamic online environments.<n>Online RL-based RS also face challenges in production deployment due to the risks of exposing users to untrained or unstable policies.<n>Large Language Models (LLMs) offer a promising solution to mimic user objectives and preferences for pre-training policies offline.<n>We propose an Interaction-Augmented Learned Policy (iALP) that utilizes user preferences distilled from an LLM.
arXiv Detail & Related papers (2025-01-23T16:37:44Z)
AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems [25.18963930580529]
Reinforcement Learning (RL) has garnered increasing attention for its ability to optimize user retention in recommender systems.<n>This paper introduces a novel approach called textbfAdaptive textbfUser textbfRetention textbfOptimization (AURO) to address this challenge.
arXiv Detail & Related papers (2023-10-06T02:45:21Z)
Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments. We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z)
AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free Deep Reinforcement Learning and Change Point Detection [34.77250498401055]
This paper introduces an adaptive model-free deep reinforcement approach that can recognize and adapt to the diurnal patterns in the ride-sharing environment with car-pooling. In addition to the adaptation logic in dispatching, this paper also proposes a dynamic, demand-aware vehicle-passenger matching and route planning framework.
arXiv Detail & Related papers (2021-04-01T02:14:01Z)
Few-shot model-based adaptation in noisy conditions [15.498933340900606]
We propose to perform few-shot adaptation of dynamics models in noisy conditions using an uncertainty-aware Kalman filter-based neural network architecture. We show that the proposed method, which explicitly addresses domain noise, improves few-shot adaptation error over a blackbox adaptation LSTM baseline. The proposed method also allows for system analysis by analyzing hidden states of the model during and after adaptation.
arXiv Detail & Related papers (2020-10-16T13:59:35Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.