Regret-Free Reinforcement Learning for LTL Specifications
- URL: http://arxiv.org/abs/2411.12019v2
- Date: Fri, 06 Jun 2025 14:51:04 GMT
- Title: Regret-Free Reinforcement Learning for LTL Specifications
- Authors: Rupak Majumdar, Mahmoud Salamati, Sadegh Soudjani,
- Abstract summary: We present the first online algorithm for learning a controller for linear temporal logic (LTL) specifications for systems with unknown dynamics.<n>Our core technical result is a regret-free learning algorithm for infinite-horizon reach-avoid problems on MDPs.
- Score: 6.342676126028222
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning to control an unknown dynamical system with respect to high-level temporal specifications is an important problem in control theory. We present the first regret-free online algorithm for learning a controller for linear temporal logic (LTL) specifications for systems with unknown dynamics. We assume that the underlying (unknown) dynamics is modeled by a finite-state and action Markov decision process (MDP). Our core technical result is a regret-free learning algorithm for infinite-horizon reach-avoid problems on MDPs. For general LTL specifications, we show that the synthesis problem can be reduced to a reach-avoid problem once the graph structure is known. Additionally, we provide an algorithm for learning the graph structure, assuming knowledge of a minimum transition probability, which operates independently of the main regret-free algorithm. Our LTL controller synthesis algorithm provides sharp bounds on how close we are to achieving optimal behavior after a finite number of learning episodes. In contrast, previous algorithms for LTL synthesis only provide asymptotic guarantees, which give no insight into the transient performance during the learning phase.
Related papers
- End-to-End Learning Framework for Solving Non-Markovian Optimal Control [9.156265463755807]
We propose an innovative system identification method control strategy for FOLTI systems.
We also develop the first end-to-end data-driven learning framework, Fractional-Order Learning for Optimal Control (FOLOC)
arXiv Detail & Related papers (2025-02-07T04:18:56Z) - DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications [59.01527054553122]
Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in reinforcement learning (RL)
Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments, are restricted to suboptimal solutions, and do not adequately handle safety constraints.
In this work, we propose a novel learning approach to address these concerns.
Our method leverages the structure of B"uchia, which explicitly represent the semantics of automat- specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae.
arXiv Detail & Related papers (2024-10-06T21:30:38Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds [59.875550175217874]
We show that a simple Model-based Reinforcement Learning scheme achieves strong regret and sample bounds in online and offline RL settings.
We highlight that our algorithms are simple, fairly standard, and indeed have been extensively studied in the RL literature.
arXiv Detail & Related papers (2024-08-16T19:52:53Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - Regularization and Optimal Multiclass Learning [10.168670899305232]
This work is to characterize the role of regularization in perhaps the simplest setting for which empirical risk minimization fails: multiclass learning with arbitrary label sets.
Using one-inclusion graphs (OIGs), we exhibit optimal learning algorithms that dovetail with tried-and-true algorithmic principles.
arXiv Detail & Related papers (2023-09-24T16:49:55Z) - Signal Temporal Logic Neural Predictive Control [15.540490027770621]
We propose a method to learn a neural network controller to satisfy the requirements specified in Signal temporal logic (STL)
Our controller learns to roll out trajectories to maximize the STL robustness score in training.
A backup policy is designed to ensure safety when our controller fails.
arXiv Detail & Related papers (2023-09-10T20:31:25Z) - CACTO: Continuous Actor-Critic with Trajectory Optimization -- Towards
global optimality [5.0915256711576475]
This paper presents a novel algorithm for the continuous control of dynamical systems that combines Trayy (TO) and Reinforcement Learning (RL) in a single trajectory.
arXiv Detail & Related papers (2022-11-12T10:16:35Z) - Learning to Control under Time-Varying Environment [18.48729114775298]
This paper investigates the problem of regret in linear time-varying (LTV) dynamical systems.
We propose the first computationally tractable online algorithm with regret guarantees.
arXiv Detail & Related papers (2022-06-06T11:40:46Z) - Implicit Parameter-free Online Learning with Truncated Linear Models [51.71216912089413]
parameter-free algorithms are online learning algorithms that do not require setting learning rates.
We propose new parameter-free algorithms that can take advantage of truncated linear models through a new update that has an "implicit" flavor.
Based on a novel decomposition of the regret, the new update is efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter-free properties.
arXiv Detail & Related papers (2022-03-19T13:39:49Z) - Deep reinforcement learning under signal temporal logic constraints
using Lagrangian relaxation [0.0]
In general, a constraint may be imposed on the decision making.
We consider the optimal decision making problems with constraints to complete temporal high-level tasks.
We propose a two-phase constrained DRL algorithm using the Lagrangian relaxation method.
arXiv Detail & Related papers (2022-01-21T00:56:25Z) - Deep Learning Approximation of Diffeomorphisms via Linear-Control
Systems [91.3755431537592]
We consider a control system of the form $dot x = sum_i=1lF_i(x)u_i$, with linear dependence in the controls.
We use the corresponding flow to approximate the action of a diffeomorphism on a compact ensemble of points.
arXiv Detail & Related papers (2021-10-24T08:57:46Z) - Neural Active Learning with Performance Guarantees [37.16062387461106]
We investigate the problem of active learning in the streaming setting in non-parametric regimes, where the labels are generated from a class of functions on which we make no assumptions whatsoever.
We rely on recently proposed Neural Tangent Kernel (NTK) approximation tools to construct a suitable neural embedding that determines the feature space the algorithm operates on and the learned model computed atop.
arXiv Detail & Related papers (2021-06-06T20:44:23Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z) - Exact Asymptotics for Linear Quadratic Adaptive Control [6.287145010885044]
We study the simplest non-bandit reinforcement learning problem: linear quadratic control (LQAC)
We derive expressions for the regret, estimation error, and prediction error of a stepwise-updating LQAC algorithm.
In simulations on both stable and unstable systems, we find that our theory also describes the algorithm's finite-sample behavior remarkably well.
arXiv Detail & Related papers (2020-11-02T22:43:30Z) - Logarithmic Regret for Adversarial Online Control [56.12283443161479]
We give the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences.
Our algorithm and analysis use a characterization for the offline control law to reduce the online control problem to (delayed) online learning.
arXiv Detail & Related papers (2020-02-29T06:29:19Z) - Towards Neural-Guided Program Synthesis for Linear Temporal Logic
Specifications [26.547133495699093]
We use a neural network to learn a Q-function that is then used to guide search, and to construct programs that are subsequently verified for correctness.
Our method is unique in combining search with deep learning to realize synthesis.
arXiv Detail & Related papers (2019-12-31T17:09:49Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.