Related papers: A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs

A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs

URL: http://arxiv.org/abs/2310.12248v3
Date: Wed, 21 Feb 2024 01:43:56 GMT
Title: A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs
Authors: Mateo Perez, Fabio Somenzi, Ashutosh Trivedi
Abstract summary: We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in decision processes (MDPs) We prove that our algorithm only requires a number of samples to perform experiments which confirm our theory.
Score: 5.946838062187346
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes (MDPs). As part of the development of our algorithm, we introduce the epsilon-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the omega-regular objective in the limit. We prove that our algorithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory.

Related papers

Regret-Free Reinforcement Learning for LTL Specifications [6.342676126028222]
Reinforcement learning is a promising method to learn optimal control policies for systems with unknown dynamics. Current RL-based methods offer only guarantees, which provide no insight into the transient performance during the learning phase. We present the first regret-free online algorithm for learning a controller that addresses the general class of specifications over Markov decision processes.
arXiv Detail & Related papers (2024-11-18T20:01:45Z)
Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning. We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging. We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z)
On the Design and Analysis of LLM-Based Algorithms [74.7126776018275]
Large language models (LLMs) are used as sub-routines in algorithms. LLMs have achieved remarkable empirical success. Our proposed framework holds promise for advancing LLM-based algorithms.
arXiv Detail & Related papers (2024-07-20T07:39:07Z)
Reinforcement Learning for Omega-Regular Specifications on Continuous-Time MDP [1.8262547855491456]
Continuous-time Markov decision processes (CTMDPs) are canonical models to express sequential decision-making under dense-time environments. We present an approach enabling correct translation to scalar reward signals for CTMDPs.
arXiv Detail & Related papers (2023-03-16T17:45:38Z)
Computably Continuous Reinforcement-Learning Objectives are PAC-learnable [12.700911432945151]
In reinforcement learning, the classic objectives of maximizing discounted and finite-horizon cumulative rewards are PAC-learnable. In recent years, researchers have introduced objectives and corresponding reinforcement-learning algorithms beyond the classic cumulative rewards. This work demonstrates the PAC-learnability of general reinforcement-learning objectives through sufficient conditions for PAC-learnability.
arXiv Detail & Related papers (2023-03-09T16:05:10Z)
Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm. We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z)
Reinforcement Learning for General LTL Objectives Is Intractable [10.69663517250214]
We formalize the problem under the probably correct learning in Markov decision processes (PACMDP) framework. Our result implies it is impossible for a reinforcement-learning algorithm to obtain a PAC-MDP guarantee on the performance of its learned policy.
arXiv Detail & Related papers (2021-11-24T18:26:13Z)
Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives. Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process. We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs) The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.