Related papers: DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays

DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays

URL: http://arxiv.org/abs/2406.03102v1
Date: Wed, 5 Jun 2024 09:45:26 GMT
Title: DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays
Authors: Bo Xia, Yilun Kong, Yongzhe Chang, Bo Yuan, Zhiheng Li, Xueqian Wang, Bin Liang,
Abstract summary: We propose $textbfDEER (Delay-resilient-Enhanced RL)$, a framework designed to effectively enhance the interpretability and address the random delay issues. In a variety of delayed scenarios, the trained encoder can seamlessly integrate with standard RL algorithms without requiring additional modifications. The results confirm that DEER is superior to state-of-the-art RL algorithms in both constant and random delay settings.
Score: 26.032139258562708
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Classic reinforcement learning (RL) frequently confronts challenges in tasks involving delays, which cause a mismatch between received observations and subsequent actions, thereby deviating from the Markov assumption. Existing methods usually tackle this issue with end-to-end solutions using state augmentation. However, these black-box approaches often involve incomprehensible processes and redundant information in the information states, causing instability and potentially undermining the overall performance. To alleviate the delay challenges in RL, we propose $\textbf{DEER (Delay-resilient Encoder-Enhanced RL)}$, a framework designed to effectively enhance the interpretability and address the random delay issues. DEER employs a pretrained encoder to map delayed states, along with their variable-length past action sequences resulting from different delays, into hidden states, which is trained on delay-free environment datasets. In a variety of delayed scenarios, the trained encoder can seamlessly integrate with standard RL algorithms without requiring additional modifications and enhance the delay-solving capability by simply adapting the input dimension of the original algorithms. We evaluate DEER through extensive experiments on Gym and Mujoco environments. The results confirm that DEER is superior to state-of-the-art RL algorithms in both constant and random delay settings.

Related papers

Intra-request branch orchestration for efficient LLM reasoning [52.68946975865865]
Large Language Models (LLMs) increasingly rely on inference-time reasoning algorithms to improve accuracy on complex tasks.<n>Prior work has largely focused on reducing token usage, often at the expense of accuracy, while overlooking other latency factors.<n>We present DUCHESS, an LLM serving system that reduces cost and latency without sacrificing accuracy through intra-request branch orchestration guided by predictions.
arXiv Detail & Related papers (2025-09-29T15:52:08Z)
Unleashing the True Potential of LLMs: A Feedback-Triggered Self-Correction with Long-Term Multipath Decoding [4.220190655754022]
Large Language Models (LLMs) have achieved remarkable performance across diverse tasks, yet their susceptibility to generating incorrect content during inference remains a critical unsolved challenge.<n>We propose Feedback-Triggered Regeneration (FTR), a novel framework that synergizes user feedback with enhanced decoding dynamics.<n>Specifically, FTR activates response regeneration only upon receiving negative user feedback, thereby circumventing error propagation from faulty self-assessment while preserving originally correct outputs.
arXiv Detail & Related papers (2025-09-09T12:43:28Z)
Reinforcement Learning via Conservative Agent for Environments with Random Delays [2.115993069505241]
We propose a simple yet robust agent for decision-making under random delays, termed the conservative agent, which reformulates the random-delay environment into its constant-delay equivalent.<n>This enables any state-of-the-art constant-delay method to be directly extended to the random-delay environments without modifying the algorithmic structure or sacrificing performance.
arXiv Detail & Related papers (2025-07-25T06:41:06Z)
Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Delayed Observation [10.511062258286335]
In real-world multi-agent systems, observation delays are ubiquitous, preventing agents from making decisions based on the environment's true state.<n>These discrete observation components with varying delay characteristics pose significant challenges for multi-agent reinforcement learning.<n>We first formulate the decentralized individual delay partially observable decision process (DSID-POMDP) by extending the standard Dec-POMDP.<n>We then propose the Rainbow Delay Compensation (RDC) framework for addressing individual delays, along with recommended implementations for its constituent modules.
arXiv Detail & Related papers (2025-05-06T14:47:56Z)
Tree Search-Based Policy Optimization under Stochastic Execution Delay [46.849634120584646]
Delayed execution MDPs are a new formalism addressing random delays without resorting to state augmentation. We show that given observed delay values, it is sufficient to perform a policy search in the class of Markov policies. We devise DEZ, a model-based algorithm that optimize over the class of Markov policies.
arXiv Detail & Related papers (2024-04-08T12:19:04Z)
Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling. Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z)
Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays [41.52768902667611]
Reinforcement learning (RL) is challenging in the common case of delays between events and their sensory perceptions. We present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary tasks involving short delays to accelerate RL with long delays. Specifically, AD-RL learns a value function for short delays and uses bootstrapping and policy improvement techniques to adjust it for long delays.
arXiv Detail & Related papers (2024-02-05T16:11:03Z)
Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation [62.969796245827006]
Delayed-PSVI is an optimistic value-based algorithm that explores the value function space via noise perturbation with posterior sampling. We show our algorithm achieves $widetildeO(sqrtd3H3 T + d2H2 E[tau]$ worst-case regret in the presence of unknown delays. We incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI.
arXiv Detail & Related papers (2023-10-29T06:12:43Z)
A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays [24.998122268199797]
We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback. Our algorithm can tolerate arbitrary excessive delays up to order $T$.
arXiv Detail & Related papers (2023-08-21T12:17:40Z)
Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent [63.43247232708004]
A gradient descent performed in an asynchronous manner plays a crucial role in training large-scale machine learning models. Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization. Our theoretical results indicate that asynchronous delays reduce the generalization error of the delayed SGD algorithm.
arXiv Detail & Related papers (2023-08-18T10:00:27Z)
DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion [137.8749239614528]
We propose a new formulation of temporal action detection (TAD) with denoising diffusion, DiffTAD. Taking as input random temporal proposals, it can yield action proposals accurately given an untrimmed long video.
arXiv Detail & Related papers (2023-03-27T00:40:52Z)
Effective Multi-User Delay-Constrained Scheduling with Deep Recurrent Reinforcement Learning [28.35473469490186]
Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing. We propose a deep reinforcement learning (DRL) algorithm, named Recurrent Softmax Delayed Deep Double Deterministic Policy Gradient ($mathttRSD4$) $mathttRSD4$ guarantees resource and delay constraints by Lagrangian dual and delay-sensitive queues, respectively. It also efficiently tackles partial observability with a memory mechanism enabled by the recurrent neural network (RNN) and introduces user-level decomposition and node-level
arXiv Detail & Related papers (2022-08-30T08:44:15Z)
Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays [10.484851004093919]
This paper formally describes the notion of Markov Decision Processes (MDPs) with delays. We show that delayed MDPs can be transformed into equivalent standard MDPs (without delays) with significantly simplified cost structure. We employ this equivalence to derive a model-free Delay-Resolved RL framework and show that even a simple RL algorithm built upon this framework achieves near-optimal rewards in environments with delays in actions and observations.
arXiv Detail & Related papers (2021-08-17T10:45:55Z)
Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD) We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z)
Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR [44.229256049718316]
Streaming attention-based sequence-to-sequence (S2S) models have been proposed to perform online speech recognition with linear-time decoding complexity. In these models, the decisions to generate tokens are delayed compared to the actual acoustic boundaries since their unidirectional encoders lack future information. We propose several strategies during training by leveraging external hard alignments extracted from the hybrid model. Experiments on the Cortana voice search task demonstrate that our proposed methods can significantly reduce the latency, and even improve the recognition accuracy in certain cases on the decoder side.
arXiv Detail & Related papers (2020-04-10T12:24:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.