Computably Continuous Reinforcement-Learning Objectives are
PAC-learnable
- URL: http://arxiv.org/abs/2303.05518v2
- Date: Sun, 19 Mar 2023 16:01:30 GMT
- Title: Computably Continuous Reinforcement-Learning Objectives are
PAC-learnable
- Authors: Cambridge Yang, Michael Littman, Michael Carbin
- Abstract summary: In reinforcement learning, the classic objectives of maximizing discounted and finite-horizon cumulative rewards are PAC-learnable.
In recent years, researchers have introduced objectives and corresponding reinforcement-learning algorithms beyond the classic cumulative rewards.
This work demonstrates the PAC-learnability of general reinforcement-learning objectives through sufficient conditions for PAC-learnability.
- Score: 12.700911432945151
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In reinforcement learning, the classic objectives of maximizing discounted
and finite-horizon cumulative rewards are PAC-learnable: There are algorithms
that learn a near-optimal policy with high probability using a finite amount of
samples and computation. In recent years, researchers have introduced
objectives and corresponding reinforcement-learning algorithms beyond the
classic cumulative rewards, such as objectives specified as linear temporal
logic formulas. However, questions about the PAC-learnability of these new
objectives have remained open.
This work demonstrates the PAC-learnability of general reinforcement-learning
objectives through sufficient conditions for PAC-learnability in two analysis
settings. In particular, for the analysis that considers only sample
complexity, we prove that if an objective given as an oracle is uniformly
continuous, then it is PAC-learnable. Further, for the analysis that considers
computational complexity, we prove that if an objective is computable, then it
is PAC-learnable. In other words, if a procedure computes successive
approximations of the objective's value, then the objective is PAC-learnable.
We give three applications of our condition on objectives from the literature
with previously unknown PAC-learnability and prove that these objectives are
PAC-learnable. Overall, our result helps verify existing objectives'
PAC-learnability. Also, as some studied objectives that are not uniformly
continuous have been shown to be not PAC-learnable, our results could guide the
design of new PAC-learnable objectives.
Related papers
- A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs [5.946838062187346]
We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in decision processes (MDPs)
We prove that our algorithm only requires a number of samples to perform experiments which confirm our theory.
arXiv Detail & Related papers (2023-10-18T18:33:41Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Unified Algorithms for RL with Decision-Estimation Coefficients: PAC, Reward-Free, Preference-Based Learning, and Beyond [28.118197762236953]
We develop a unified algorithm framework for a large class of learning goals.
Our framework handles many learning goals such as no-regret RL, PAC RL, reward-free learning, model estimation, and preference-based learning.
As applications, we propose "decouplable representation" as a natural sufficient condition for bounding generalized DECs.
arXiv Detail & Related papers (2022-09-23T17:47:24Z) - Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - Reinforcement Learning for General LTL Objectives Is Intractable [10.69663517250214]
We formalize the problem under the probably correct learning in Markov decision processes (PACMDP) framework.
Our result implies it is impossible for a reinforcement-learning algorithm to obtain a PAC-MDP guarantee on the performance of its learned policy.
arXiv Detail & Related papers (2021-11-24T18:26:13Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees [77.67258935234403]
We provide a theoretical analysis using the PAC-Bayesian framework and derive novel generalization bounds for meta-learning.
We develop a class of PAC-optimal meta-learning algorithms with performance guarantees and a principled meta-level regularization.
arXiv Detail & Related papers (2020-02-13T15:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.