Reinforcement Learning for Control Systems with Time Delays: A Comprehensive Survey
- URL: http://arxiv.org/abs/2602.00399v1
- Date: Fri, 30 Jan 2026 23:25:30 GMT
- Title: Reinforcement Learning for Control Systems with Time Delays: A Comprehensive Survey
- Authors: Armando Alves Neto,
- Abstract summary: Reinforcement Learning (RL) has achieved remarkable success in the control and decision-making of complex dynamical systems.<n>Most RL algorithms rely on the Markov Decision Process assumption, which is violated in practical cyber-physical systems.<n>This paper presents a comprehensive survey of RL methods designed to address time delays in control systems.
- Score: 2.3602634041257624
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the last decade, Reinforcement Learning (RL) has achieved remarkable success in the control and decision-making of complex dynamical systems. However, most RL algorithms rely on the Markov Decision Process assumption, which is violated in practical cyber-physical systems affected by sensing delays, actuation latencies, and communication constraints. Such time delays introduce memory effects that can significantly degrade performance and compromise stability, particularly in networked and multi-agent environments. This paper presents a comprehensive survey of RL methods designed to address time delays in control systems. We first formalize the main classes of delays and analyze their impact on the Markov property. We then systematically categorize existing approaches into five major families: state augmentation and history-based representations, recurrent policies with learned memory, predictor-based and model-aware methods, robust and domain-randomized training strategies, and safe RL frameworks with explicit constraint handling. For each family, we discuss underlying principles, practical advantages, and inherent limitations. A comparative analysis highlights key trade-offs among these approaches and provides practical guidelines for selecting suitable methods under different delay characteristics and safety requirements. Finally, we identify open challenges and promising research directions, including stability certification, large-delay learning, multi-agent communication co-design, and standardized benchmarking. This survey aims to serve as a unified reference for researchers and practitioners developing reliable RL-based controllers in delay-affected cyber-physical systems.
Related papers
- ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning [75.73135757250806]
Agentic reinforcement learning (ARL) has rapidly gained attention as a promising paradigm for training agents to solve complex, multi-step interactive tasks.<n>Despite encouraging early results, ARL remains highly unstable, often leading to training collapse.<n>In this paper, we first propose ARLArena, a stable training recipe and systematic analysis framework that examines training stability in a controlled and reproducible setting.
arXiv Detail & Related papers (2026-02-25T03:43:34Z) - Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z) - A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control [21.22244612145334]
Diffusion policies have emerged as a powerful approach for robotic control.<n>Online Diffusion Policy Reinforcement Learning (Online DPRL) algorithms for scalable robotic control systems are studied.
arXiv Detail & Related papers (2026-01-05T05:19:23Z) - Multi-hop Reasoning via Early Knowledge Alignment [68.28168992785896]
Early Knowledge Alignment (EKA) aims to align Large Language Models with contextually relevant retrieved knowledge.<n>EKA significantly improves retrieval precision, reduces cascading errors, and enhances both performance and efficiency.<n>EKA proves effective as a versatile, training-free inference strategy that scales seamlessly to large models.
arXiv Detail & Related papers (2025-12-23T08:14:44Z) - Control Synthesis of Cyber-Physical Systems for Real-Time Specifications through Causation-Guided Reinforcement Learning [3.608670495432032]
Signal temporal logic (STL) has emerged as a powerful formalism of expressing real-time constraints.<n> reinforcement learning (RL) has become an important method for solving control synthesis problems in unknown environments.<n>We propose an online reward generation method guided by the online causation monitoring of STL.
arXiv Detail & Related papers (2025-10-09T02:49:28Z) - Learning Robust Penetration-Testing Policies under Partial Observability: A systematic evaluation [0.28675177318965045]
Penetration testing, the simulation of cyberattacks to identify security vulnerabilities, presents a sequential decision-making problem.<n>Partial observability invalidates the Markov property present in Markov Decision Processes.<n>We investigate, partially observable penetration testing scenarios over host networks of varying size, aiming to better reflect real-world complexity.
arXiv Detail & Related papers (2025-09-24T11:27:54Z) - STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning [54.28691219536054]
We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities.<n>We develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping.<n>Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-26T08:47:58Z) - Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning [53.85659415230589]
This paper systematically reviews widely adoptedReinforcement learning techniques.<n>We present clear guidelines for selecting RL techniques tailored to specific setups.<n>We also reveal that a minimalist combination of two techniques can unlock the learning capability of critic-free policies.
arXiv Detail & Related papers (2025-08-11T17:39:45Z) - End-to-End Learning Framework for Solving Non-Markovian Optimal Control [13.207458293652635]
We propose an innovative system identification method control strategy for FOLTI systems.<n>We also develop the first end-to-end data-driven learning framework, Fractional-Order Learning for Optimal Control (FOLOC)
arXiv Detail & Related papers (2025-02-07T04:18:56Z) - Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z) - Investigating Robustness in Cyber-Physical Systems: Specification-Centric Analysis in the face of System Deviations [8.8690305802668]
A critical attribute of cyber-physical systems (CPS) is robustness, denoting its capacity to operate safely.
This paper proposes a novel specification-based robustness, which characterizes the effectiveness of a controller in meeting a specified system requirement.
We present an innovative two-layer simulation-based analysis framework designed to identify subtle robustness violations.
arXiv Detail & Related papers (2023-11-13T16:44:43Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.