Adaptive Reinforcement Learning for Unobservable Random Delays
- URL: http://arxiv.org/abs/2506.14411v1
- Date: Tue, 17 Jun 2025 11:11:37 GMT
- Title: Adaptive Reinforcement Learning for Unobservable Random Delays
- Authors: John Wikman, Alexandre Proutiere, David Broman,
- Abstract summary: We introduce a general framework that enables agents to adaptively handle unobservable and time-varying delays.<n>Specifically, the agent generates a matrix of possible future actions to handle both unpredictable delays and lost action packets sent over networks.<n>Our method significantly outperforms state-of-the-art approaches across a wide range of benchmark environments.
- Score: 46.04329493317009
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In standard Reinforcement Learning (RL) settings, the interaction between the agent and the environment is typically modeled as a Markov Decision Process (MDP), which assumes that the agent observes the system state instantaneously, selects an action without delay, and executes it immediately. In real-world dynamic environments, such as cyber-physical systems, this assumption often breaks down due to delays in the interaction between the agent and the system. These delays can vary stochastically over time and are typically unobservable, meaning they are unknown when deciding on an action. Existing methods deal with this uncertainty conservatively by assuming a known fixed upper bound on the delay, even if the delay is often much lower. In this work, we introduce the interaction layer, a general framework that enables agents to adaptively and seamlessly handle unobservable and time-varying delays. Specifically, the agent generates a matrix of possible future actions to handle both unpredictable delays and lost action packets sent over networks. Building on this framework, we develop a model-based algorithm, Actor-Critic with Delay Adaptation (ACDA), which dynamically adjusts to delay patterns. Our method significantly outperforms state-of-the-art approaches across a wide range of locomotion benchmark environments.
Related papers
- Reinforcement Learning via Conservative Agent for Environments with Random Delays [2.115993069505241]
We propose a simple yet robust agent for decision-making under random delays, termed the conservative agent, which reformulates the random-delay environment into its constant-delay equivalent.<n>This enables any state-of-the-art constant-delay method to be directly extended to the random-delay environments without modifying the algorithmic structure or sacrificing performance.
arXiv Detail & Related papers (2025-07-25T06:41:06Z) - Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Delayed Observation [10.511062258286335]
In real-world multi-agent systems, observation delays are ubiquitous, preventing agents from making decisions based on the environment's true state.<n>These discrete observation components with varying delay characteristics pose significant challenges for multi-agent reinforcement learning.<n>We first formulate the decentralized individual delay partially observable decision process (DSID-POMDP) by extending the standard Dec-POMDP.<n>We then propose the Rainbow Delay Compensation (RDC) framework for addressing individual delays, along with recommended implementations for its constituent modules.
arXiv Detail & Related papers (2025-05-06T14:47:56Z) - Tree Search-Based Policy Optimization under Stochastic Execution Delay [46.849634120584646]
Delayed execution MDPs are a new formalism addressing random delays without resorting to state augmentation.
We show that given observed delay values, it is sufficient to perform a policy search in the class of Markov policies.
We devise DEZ, a model-based algorithm that optimize over the class of Markov policies.
arXiv Detail & Related papers (2024-04-08T12:19:04Z) - DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server.
To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z) - Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling.
Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z) - Real-Time Motion Prediction via Heterogeneous Polyline Transformer with
Relative Pose Encoding [121.08841110022607]
Existing agent-centric methods have demonstrated outstanding performance on public benchmarks.
We introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers.
By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods.
arXiv Detail & Related papers (2023-10-19T17:59:01Z) - Delays in Reinforcement Learning [2.5835347022640254]
This dissertation aims to study the delay in the agent's observation of the state of the environment or in the execution of the agent's actions.
A wide spectrum of delays will be considered, and potential solutions will be presented.
arXiv Detail & Related papers (2023-09-20T07:04:46Z) - MTD: Multi-Timestep Detector for Delayed Streaming Perception [0.5439020425819]
Streaming perception is a task of reporting the current state of the world, which is used to evaluate the delay and accuracy of autonomous driving systems.
This paper propose the Multi- Timestep Detector (MTD), an end-to-end detector which uses dynamic routing for multi-branch future prediction.
The proposed method has been evaluated on the Argoverse-HD dataset, and the experimental results show that it has achieved state-of-the-art performance across various delay settings.
arXiv Detail & Related papers (2023-09-13T06:23:58Z) - Neural Laplace Control for Continuous-time Delayed Systems [76.81202657759222]
We propose a continuous-time model-based offline RL method that combines a Neural Laplace dynamics model with a model predictive control (MPC) planner.
We show experimentally on continuous-time delayed environments it is able to achieve near expert policy performance.
arXiv Detail & Related papers (2023-02-24T12:40:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.