A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs
- URL: http://arxiv.org/abs/2310.12248v3
- Date: Wed, 21 Feb 2024 01:43:56 GMT
- Title: A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs
- Authors: Mateo Perez, Fabio Somenzi, Ashutosh Trivedi
- Abstract summary: We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in decision processes (MDPs)
We prove that our algorithm only requires a number of samples to perform experiments which confirm our theory.
- Score: 5.946838062187346
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL
-- have seen recent use as a way to express non-Markovian objectives in
reinforcement learning. We introduce a model-based probably approximately
correct (PAC) learning algorithm for omega-regular objectives in Markov
decision processes (MDPs). As part of the development of our algorithm, we
introduce the epsilon-recurrence time: a measure of the speed at which a policy
converges to the satisfaction of the omega-regular objective in the limit. We
prove that our algorithm only requires a polynomial number of samples in the
relevant parameters, and perform experiments which confirm our theory.
Related papers
- Regret-Free Reinforcement Learning for LTL Specifications [6.342676126028222]
Reinforcement learning is a promising method to learn optimal control policies for systems with unknown dynamics.
Current RL-based methods offer only guarantees, which provide no insight into the transient performance during the learning phase.
We present the first regret-free online algorithm for learning a controller that addresses the general class of specifications over Markov decision processes.
arXiv Detail & Related papers (2024-11-18T20:01:45Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - On the Design and Analysis of LLM-Based Algorithms [74.7126776018275]
Large language models (LLMs) are used as sub-routines in algorithms.
LLMs have achieved remarkable empirical success.
Our proposed framework holds promise for advancing LLM-based algorithms.
arXiv Detail & Related papers (2024-07-20T07:39:07Z) - Reinforcement Learning for Omega-Regular Specifications on
Continuous-Time MDP [1.8262547855491456]
Continuous-time Markov decision processes (CTMDPs) are canonical models to express sequential decision-making under dense-time environments.
We present an approach enabling correct translation to scalar reward signals for CTMDPs.
arXiv Detail & Related papers (2023-03-16T17:45:38Z) - Computably Continuous Reinforcement-Learning Objectives are
PAC-learnable [12.700911432945151]
In reinforcement learning, the classic objectives of maximizing discounted and finite-horizon cumulative rewards are PAC-learnable.
In recent years, researchers have introduced objectives and corresponding reinforcement-learning algorithms beyond the classic cumulative rewards.
This work demonstrates the PAC-learnability of general reinforcement-learning objectives through sufficient conditions for PAC-learnability.
arXiv Detail & Related papers (2023-03-09T16:05:10Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Reinforcement Learning for General LTL Objectives Is Intractable [10.69663517250214]
We formalize the problem under the probably correct learning in Markov decision processes (PACMDP) framework.
Our result implies it is impossible for a reinforcement-learning algorithm to obtain a PAC-MDP guarantee on the performance of its learned policy.
arXiv Detail & Related papers (2021-11-24T18:26:13Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.