Sequential Transfer in Reinforcement Learning with a Generative Model
- URL: http://arxiv.org/abs/2007.00722v1
- Date: Wed, 1 Jul 2020 19:53:35 GMT
- Title: Sequential Transfer in Reinforcement Learning with a Generative Model
- Authors: Andrea Tirinzoni, Riccardo Poiani, Marcello Restelli
- Abstract summary: We show how to reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones.
We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge.
We empirically verify our theoretical findings in simple simulated domains.
- Score: 48.40219742217783
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We are interested in how to design reinforcement learning agents that
provably reduce the sample complexity for learning new tasks by transferring
knowledge from previously-solved ones. The availability of solutions to related
problems poses a fundamental trade-off: whether to seek policies that are
expected to achieve high (yet sub-optimal) performance in the new task
immediately or whether to seek information to quickly identify an optimal
solution, potentially at the cost of poor initial behavior. In this work, we
focus on the second objective when the agent has access to a generative model
of state-action pairs. First, given a set of solved tasks containing an
approximation of the target one, we design an algorithm that quickly identifies
an accurate solution by seeking the state-action pairs that are most
informative for this purpose. We derive PAC bounds on its sample complexity
which clearly demonstrate the benefits of using this kind of prior knowledge.
Then, we show how to learn these approximate tasks sequentially by reducing our
transfer setting to a hidden Markov model and employing spectral methods to
recover its parameters. Finally, we empirically verify our theoretical findings
in simple simulated domains.
Related papers
- Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling [0.0]
This study proposes a different approach that integrates gradient-based update through continuous relaxation, combined with Quasi-Quantum Annealing (QQA)
Numerical experiments demonstrate that our method is a competitive general-purpose solver, achieving performance comparable to iSCO and learning-based solvers.
arXiv Detail & Related papers (2024-09-02T12:55:27Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Learning How to Infer Partial MDPs for In-Context Adaptation and
Exploration [17.27164535440641]
Posterior sampling is a promising approach, but it requires Bayesian inference and dynamic programming.
We show that even though partial models exclude relevant information from the environment, they can nevertheless lead to good policies.
arXiv Detail & Related papers (2023-02-08T18:35:24Z) - Hierarchically Structured Task-Agnostic Continual Learning [0.0]
We take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle.
We propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths.
Our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms.
arXiv Detail & Related papers (2022-11-14T19:53:15Z) - Provable Benefits of Representational Transfer in Reinforcement Learning [59.712501044999875]
We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation.
We show that given generative access to source tasks, we can discover a representation, using which subsequent linear RL techniques quickly converge to a near-optimal policy.
arXiv Detail & Related papers (2022-05-29T04:31:29Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Submodular Meta-Learning [43.15332631500541]
We introduce a discrete variant of the meta-learning framework to improve performance on future tasks.
Our approach aims at using prior data, i.e., previously visited tasks, to train a proper initial solution set.
We show that our framework leads to a significant reduction in computational complexity in solving the new tasks while incurring a small performance loss.
arXiv Detail & Related papers (2020-07-11T21:02:48Z) - Meta Cyclical Annealing Schedule: A Simple Approach to Avoiding
Meta-Amortization Error [50.83356836818667]
We develop a novel meta-regularization objective using it cyclical annealing schedule and it maximum mean discrepancy (MMD) criterion.
The experimental results show that our approach substantially outperforms standard meta-learning algorithms.
arXiv Detail & Related papers (2020-03-04T04:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.