Reinforcement Learning with LTL and $ω$-Regular Objectives via Optimality-Preserving Translation to Average Rewards
- URL: http://arxiv.org/abs/2410.12175v1
- Date: Wed, 16 Oct 2024 02:42:37 GMT
- Title: Reinforcement Learning with LTL and $ω$-Regular Objectives via Optimality-Preserving Translation to Average Rewards
- Authors: Xuan-Bach Le, Dominik Wagner, Leon Witzman, Alexander Rabinovich, Luke Ong,
- Abstract summary: Linear temporal logic (LTL) and, more generally, $omega$-regular objectives are alternatives to the traditional discount sum and average reward objectives in reinforcement learning.
We show that each RL problem for $omega$-regular objectives can be reduced to a limit-average reward problem in an optimality-preserving fashion.
- Score: 43.816375964005026
- License:
- Abstract: Linear temporal logic (LTL) and, more generally, $\omega$-regular objectives are alternatives to the traditional discount sum and average reward objectives in reinforcement learning (RL), offering the advantage of greater comprehensibility and hence explainability. In this work, we study the relationship between these objectives. Our main result is that each RL problem for $\omega$-regular objectives can be reduced to a limit-average reward problem in an optimality-preserving fashion, via (finite-memory) reward machines. Furthermore, we demonstrate the efficacy of this approach by showing that optimal policies for limit-average problems can be found asymptotically by solving a sequence of discount-sum problems approximately. Consequently, we resolve an open problem: optimal policies for LTL and $\omega$-regular objectives can be learned asymptotically.
Related papers
- Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces [16.400288624027375]
In many real-world settings, it is important to optimize over multiple objectives simultaneously.
We consider a multi-objective optimization problem in which each objective is defined by a state-based reweighting of a single scalar reward function.
We provide oracle-efficient algorithms to solve these multi-objective RL problems even when the number of objectives is exponentially large.
arXiv Detail & Related papers (2025-02-17T14:25:33Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - Learning to Cover: Online Learning and Optimization with Irreversible Decisions [50.5775508521174]
We find that regret grows sub-linearly at a rate $Thetaleft(mfrac12cdotfrac11-2-Tright)$, thus converging exponentially fast to $Theta(sqrtm)$.
These findings underscore the benefits of limited online learning and optimization, in that even a few rounds can provide significant benefits as compared to a no-learning baseline.
arXiv Detail & Related papers (2024-06-20T23:00:25Z) - $f$-Policy Gradients: A General Framework for Goal Conditioned RL using
$f$-Divergences [44.91973620442546]
This paper introduces a novel way to encourage exploration called $f$-Policy Gradients.
We show that $f$-PG has better performance compared to standard policy methods on a challenging gridworld.
arXiv Detail & Related papers (2023-10-10T17:07:05Z) - Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback.
We consider the general reward setting where the reward can be defined over the whole trajectory.
We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z) - Learning Proximal Operators to Discover Multiple Optima [66.98045013486794]
We present an end-to-end method to learn the proximal operator across non-family problems.
We show that for weakly-ized objectives and under mild conditions, the method converges globally.
arXiv Detail & Related papers (2022-01-28T05:53:28Z) - Joint Optimization of Multi-Objective Reinforcement Learning with Policy
Gradient Based Algorithm [34.77250498401055]
We formulate the problem of maximizing a non-linear concave function of multiple long-term objectives.
A policy-gradient based model-free algorithm is proposed for the problem.
The proposed algorithm is shown to achieve convergence to within an $epsilon$ of the global optima.
arXiv Detail & Related papers (2021-05-28T22:20:54Z) - A Generalised Inverse Reinforcement Learning Framework [24.316047317028147]
inverse Reinforcement Learning (IRL) is to estimate the unknown cost function of some MDP base on observed trajectories.
We introduce an alternative training loss that puts more weights on future states which yields a reformulation of the (maximum entropy) IRL problem.
The algorithms we devised exhibit enhanced performances (and similar tractability) than off-the-shelf ones in multiple OpenAI gym environments.
arXiv Detail & Related papers (2021-05-25T10:30:45Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z) - Learning Fair Policies in Multiobjective (Deep) Reinforcement Learning
with Average and Discounted Rewards [15.082715993594121]
We investigate the problem of learning a policy that treats its users equitably.
In this paper, we formulate this novel RL problem, in which an objective function, which encodes a notion of fairness, is optimized.
We describe how several classic deep RL algorithms can be adapted to our fair optimization problem.
arXiv Detail & Related papers (2020-08-18T07:17:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.