Related papers: Model-Based Reinforcement Learning under Random Observation Delays

Model-Based Reinforcement Learning under Random Observation Delays

URL: http://arxiv.org/abs/2509.20869v1
Date: Thu, 25 Sep 2025 08:01:13 GMT
Title: Model-Based Reinforcement Learning under Random Observation Delays
Authors: Armin Karamzade, Kyungmin Kim, JB Lanier, Davide Corsi, Roy Fox,
Abstract summary: We study random sensor delays in POMDPs, where observations may arrive out-of-sequence.<n>We propose a model-based filtering process that sequentially updates the belief state based on an incoming stream of observations.<n>We then introduce a simple delay-aware framework that incorporates this idea into model-based RL.
Score: 9.860349466867193
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Delays frequently occur in real-world environments, yet standard reinforcement learning (RL) algorithms often assume instantaneous perception of the environment. We study random sensor delays in POMDPs, where observations may arrive out-of-sequence, a setting that has not been previously addressed in RL. We analyze the structure of such delays and demonstrate that naive approaches, such as stacking past observations, are insufficient for reliable performance. To address this, we propose a model-based filtering process that sequentially updates the belief state based on an incoming stream of observations. We then introduce a simple delay-aware framework that incorporates this idea into model-based RL, enabling agents to effectively handle random delays. Applying this framework to Dreamer, we compare our approach to delay-aware baselines developed for MDPs. Our method consistently outperforms these baselines and demonstrates robustness to delay distribution shifts during deployment. Additionally, we present experiments on simulated robotic tasks, comparing our method to common practical heuristics and emphasizing the importance of explicitly modeling observation delays.

Related papers

When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift [64.37959940809633]
We study robustness of Proximal Policy Optimization (PPO) under temporally persistent sensor failures.<n>We show Transformer-based sequence policies substantially outperform, RNN, and SSMs in robustness, maintaining high returns even when large fractions of sensors are unavailable.
arXiv Detail & Related papers (2026-03-04T22:21:54Z)
Accelerated Sequential Flow Matching: A Bayesian Filtering Perspective [16.29333060724397]
We introduce Sequential Flow Matching, a principled framework grounded in Bayesian filtering.<n>By treating streaming inference as learning a probability flow that transports the predictive distribution from one time step to the next, our approach naturally aligns with the structure of Bayesian belief updates.<n>Our method achieves performance competitive with full-step diffusion while requiring only one or very few sampling steps, therefore with faster sampling.
arXiv Detail & Related papers (2026-02-05T05:37:14Z)
SynCast: Synergizing Contradictions in Precipitation Nowcasting via Diffusion Sequential Preference Optimization [62.958457694151384]
We introduce preference optimization into precipitation nowcasting for the first time, motivated by the success of reinforcement learning from human feedback in large language models.<n>In the first stage, the framework focuses on reducing FAR, training the model to effectively suppress false alarms.
arXiv Detail & Related papers (2025-10-22T16:11:22Z)
ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z)
Reinforcement Learning via Conservative Agent for Environments with Random Delays [2.115993069505241]
We propose a simple yet robust agent for decision-making under random delays, termed the conservative agent, which reformulates the random-delay environment into its constant-delay equivalent.<n>This enables any state-of-the-art constant-delay method to be directly extended to the random-delay environments without modifying the algorithmic structure or sacrificing performance.
arXiv Detail & Related papers (2025-07-25T06:41:06Z)
Elucidated Rolling Diffusion Models for Probabilistic Weather Forecasting [52.6508222408558]
We introduce Elucidated Rolling Diffusion Models (ERDM)<n>ERDM is the first framework to unify a rolling forecast structure with the principled, performant design of Elucidated Diffusion Models (EDM)<n>On 2D Navier-Stokes simulations and ERA5 global weather forecasting at 1.5circ resolution, ERDM consistently outperforms key diffusion-based baselines.
arXiv Detail & Related papers (2025-06-24T21:44:31Z)
Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models [71.63194926457119]
We introduce Dynamical Diffusion (DyDiff), a theoretically sound framework that incorporates temporally aware forward and reverse processes.<n>Experiments across scientifictemporal forecasting, video prediction, and time series forecasting demonstrate that Dynamical Diffusion consistently improves performance in temporal predictive tasks.
arXiv Detail & Related papers (2025-03-02T16:10:32Z)
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning [15.789898162610529]
Reinforcement learning from human feedback (RLHF) has become a crucial step in building reliable generative AI models.<n>This study is to develop a disciplined approach to fine-tune diffusion models using continuous-time RL.
arXiv Detail & Related papers (2025-02-03T20:50:05Z)
Reinforcement Learning from Delayed Observations via World Models [10.298219828693489]
In reinforcement learning settings, agents assume immediate feedback about the effects of their actions after taking them. In practice, this assumption may not hold true due to physical constraints and can significantly impact the performance of learning algorithms. We propose leveraging world models, which have shown success in integrating past observations and learning dynamics, to handle observation delays.
arXiv Detail & Related papers (2024-03-18T23:18:27Z)
Streaming Motion Forecasting for Autonomous Driving [71.7468645504988]
We introduce a benchmark that queries future trajectories on streaming data and we refer to it as "streaming forecasting" Our benchmark inherently captures the disappearance and re-appearance of agents, which is a safety-critical problem yet overlooked by snapshot-based benchmarks. We propose a plug-and-play meta-algorithm called "Predictive Streamer" that can adapt any snapshot-based forecaster into a streaming forecaster.
arXiv Detail & Related papers (2023-10-02T17:13:16Z)
Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive? [93.10694819127608]
We propose a unified evaluation pipeline for forecasting methods with real-world perception inputs. Our in-depth study uncovers a substantial performance gap when transitioning from curated to perception-based data.
arXiv Detail & Related papers (2023-06-15T17:03:14Z)
Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays [10.484851004093919]
This paper formally describes the notion of Markov Decision Processes (MDPs) with delays. We show that delayed MDPs can be transformed into equivalent standard MDPs (without delays) with significantly simplified cost structure. We employ this equivalence to derive a model-free Delay-Resolved RL framework and show that even a simple RL algorithm built upon this framework achieves near-optimal rewards in environments with delays in actions and observations.
arXiv Detail & Related papers (2021-08-17T10:45:55Z)
Stochastically forced ensemble dynamic mode decomposition for forecasting and analysis of near-periodic systems [65.44033635330604]
We introduce a novel load forecasting method in which observed dynamics are modeled as a forced linear system. We show that its use of intrinsic linear dynamics offers a number of desirable properties in terms of interpretability and parsimony. Results are presented for a test case using load data from an electrical grid.
arXiv Detail & Related papers (2020-10-08T20:25:52Z)
Reinforcement Learning with Random Delays [14.707955337702943]
We show that partially resampling trajectory fragments in hindsight allows for off-policy multi-step value estimation. We apply this principle to derive Delay-Correcting Actor-Critic (DCAC), an algorithm based on Soft Actor-Critic with significantly better performance in environments with delays.
arXiv Detail & Related papers (2020-10-06T18:39:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.