Reinforcement Learning for General LTL Objectives Is Intractable
- URL: http://arxiv.org/abs/2111.12679v1
- Date: Wed, 24 Nov 2021 18:26:13 GMT
- Title: Reinforcement Learning for General LTL Objectives Is Intractable
- Authors: Cambridge Yang, Michael Littman, Michael Carbin
- Abstract summary: We formalize the problem under the probably correct learning in Markov decision processes (PACMDP) framework.
Our result implies it is impossible for a reinforcement-learning algorithm to obtain a PAC-MDP guarantee on the performance of its learned policy.
- Score: 10.69663517250214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, researchers have made significant progress in devising
reinforcement-learning algorithms for optimizing linear temporal logic (LTL)
objectives and LTL-like objectives. Despite these advancements, there are
fundamental limitations to how well this problem can be solved that previous
studies have alluded to but, to our knowledge, have not examined in depth. In
this paper, we address theoretically the hardness of learning with general LTL
objectives. We formalize the problem under the probably approximately correct
learning in Markov decision processes (PAC-MDP) framework, a standard framework
for measuring sample complexity in reinforcement learning. In this
formalization, we prove that the optimal policy for any LTL formula is
PAC-MDP-learnable only if the formula is in the most limited class in the LTL
hierarchy, consisting of only finite-horizon-decidable properties. Practically,
our result implies that it is impossible for a reinforcement-learning algorithm
to obtain a PAC-MDP guarantee on the performance of its learned policy after
finitely many interactions with an unconstrained environment for
non-finite-horizon-decidable LTL objectives.
Related papers
- DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications [59.01527054553122]
Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in reinforcement learning (RL)
Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments, are restricted to suboptimal solutions, and do not adequately handle safety constraints.
In this work, we propose a novel learning approach to address these concerns.
Our method leverages the structure of B"uchia, which explicitly represent the semantics of automat- specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae.
arXiv Detail & Related papers (2024-10-06T21:30:38Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs [5.946838062187346]
We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in decision processes (MDPs)
We prove that our algorithm only requires a number of samples to perform experiments which confirm our theory.
arXiv Detail & Related papers (2023-10-18T18:33:41Z) - Computably Continuous Reinforcement-Learning Objectives are
PAC-learnable [12.700911432945151]
In reinforcement learning, the classic objectives of maximizing discounted and finite-horizon cumulative rewards are PAC-learnable.
In recent years, researchers have introduced objectives and corresponding reinforcement-learning algorithms beyond the classic cumulative rewards.
This work demonstrates the PAC-learnability of general reinforcement-learning objectives through sufficient conditions for PAC-learnability.
arXiv Detail & Related papers (2023-03-09T16:05:10Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees [77.67258935234403]
We provide a theoretical analysis using the PAC-Bayesian framework and derive novel generalization bounds for meta-learning.
We develop a class of PAC-optimal meta-learning algorithms with performance guarantees and a principled meta-level regularization.
arXiv Detail & Related papers (2020-02-13T15:01:38Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.