When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift
- URL: http://arxiv.org/abs/2603.04648v1
- Date: Wed, 04 Mar 2026 22:21:54 GMT
- Title: When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift
- Authors: Kevin Vogt-Lowell, Theodoros Tsiligkaridis, Rodney Lafuente-Mercado, Surabhi Ghatti, Shanghua Gao, Marinka Zitnik, Daniela Rus,
- Abstract summary: We study robustness of Proximal Policy Optimization (PPO) under temporally persistent sensor failures.<n>We show Transformer-based sequence policies substantially outperform, RNN, and SSMs in robustness, maintaining high returns even when large fractions of sensors are unavailable.
- Score: 64.37959940809633
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Real-world reinforcement learning systems must operate under distributional drift in their observation streams, yet most policy architectures implicitly assume fully observed and noise-free states. We study robustness of Proximal Policy Optimization (PPO) under temporally persistent sensor failures that induce partial observability and representation shift. To respond to this drift, we augment PPO with temporal sequence models, including Transformers and State Space Models (SSMs), to enable policies to infer missing information from history and maintain performance. Under a stochastic sensor failure process, we prove a high-probability bound on infinite-horizon reward degradation that quantifies how robustness depends on policy smoothness and failure persistence. Empirically, on MuJoCo continuous-control benchmarks with severe sensor dropout, we show Transformer-based sequence policies substantially outperform MLP, RNN, and SSM baselines in robustness, maintaining high returns even when large fractions of sensors are unavailable. These results demonstrate that temporal sequence reasoning provides a principled and practical mechanism for reliable operation under observation drift caused by sensor unreliability.
Related papers
- On the Plasticity and Stability for Post-Training Large Language Models [54.757672540381236]
We identify a root cause as the conflict between plasticity and stability gradients.<n>We propose Probabilistic Conflict Resolution (PCR), a framework that models gradients as random variables.<n>PCR significantly smooths the training trajectory and achieves superior performance in various reasoning tasks.
arXiv Detail & Related papers (2026-02-06T07:31:26Z) - Analyzing and Improving Diffusion Models for Time-Series Data Imputation: A Proximal Recursion Perspective [45.713195454899875]
Diffusion models (DMs) have shown promise for Time-Series Data Imputation.<n>DMs' performance remains inconsistent in complex scenarios.<n>We propose a novel framework called SPIRIT (Semi-Proximal Transport Regularized time-series Imputation)
arXiv Detail & Related papers (2026-02-01T12:11:57Z) - On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization [0.0]
We analyze the tracking performance of Gradient Descent under uniform strong convexity and smoothness in varying stepsize regimes.<n>We show that momentum can substantially amplify drift-induced tracking error, with an explicit penalty on the tracking capability.<n>These results provide a definitive theoretical grounding for the empirical instability of momentum in dynamic environments.
arXiv Detail & Related papers (2026-01-18T03:27:21Z) - ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z) - The power of dynamic causality in observer-based design for soft sensor applications [0.7965327033045845]
This paper introduces a novel framework for optimizing observer-based soft sensors through dynamic causality analysis.<n>Traditional approaches to sensor selection often rely on linearized observability indices or statistical correlations that fail to capture the temporal evolution of complex systems.
arXiv Detail & Related papers (2025-09-14T16:27:58Z) - Anomaly Detection in Complex Dynamical Systems: A Systematic Framework Using Embedding Theory and Physics-Inspired Consistency [0.0]
Anomaly detection in complex dynamical systems is essential for ensuring reliability, safety, and efficiency in industrial and cyber-physical infrastructures.<n>We propose a system-theoretic approach to anomaly detection, grounded in classical embedding theory and physics-inspired consistency principles.<n>Our findings support the hypothesis that anomalies disrupt stable system dynamics, providing a robust signal for anomaly detection.
arXiv Detail & Related papers (2025-02-26T17:06:13Z) - A Poisson-Gamma Dynamic Factor Model with Time-Varying Transition Dynamics [51.147876395589925]
A non-stationary PGDS is proposed to allow the underlying transition matrices to evolve over time.
A fully-conjugate and efficient Gibbs sampler is developed to perform posterior simulation.
Experiments show that, in comparison with related models, the proposed non-stationary PGDS achieves improved predictive performance.
arXiv Detail & Related papers (2024-02-26T04:39:01Z) - Probabilities Are Not Enough: Formal Controller Synthesis for Stochastic
Dynamical Models with Epistemic Uncertainty [68.00748155945047]
Capturing uncertainty in models of complex dynamical systems is crucial to designing safe controllers.
Several approaches use formal abstractions to synthesize policies that satisfy temporal specifications related to safety and reachability.
Our contribution is a novel abstraction-based controller method for continuous-state models with noise, uncertain parameters, and external disturbances.
arXiv Detail & Related papers (2022-10-12T07:57:03Z) - On the Sample Complexity and Metastability of Heavy-tailed Policy Search
in Continuous Control [47.71156648737803]
Reinforcement learning is a framework for interactive decision-making with incentives sequentially revealed across time without a system dynamics model.
We characterize a defined defined chain, identifying that policies associated with Levy Processes of a tail index yield to wider peaks.
arXiv Detail & Related papers (2021-06-15T20:12:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.