On the Power of Pre-training for Generalization in RL: Provable Benefits
and Hardness
- URL: http://arxiv.org/abs/2210.10464v2
- Date: Thu, 29 Jun 2023 03:26:39 GMT
- Title: On the Power of Pre-training for Generalization in RL: Provable Benefits
and Hardness
- Authors: Haotian Ye, Xiaoyu Chen, Liwei Wang, Simon S. Du
- Abstract summary: Generalization in Reinforcement Learning (RL) aims to learn an agent during training that generalizes to the target environment.
This paper studies RL generalization from a theoretical aspect: how much can we expect pre-training over training environments to be helpful?
When the interaction with the target environment is not allowed, we certify that the best we can obtain is a near-optimal policy in an average sense, and we design an algorithm that achieves this goal.
- Score: 47.09873295916592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization in Reinforcement Learning (RL) aims to learn an agent during
training that generalizes to the target environment. This paper studies RL
generalization from a theoretical aspect: how much can we expect pre-training
over training environments to be helpful? When the interaction with the target
environment is not allowed, we certify that the best we can obtain is a
near-optimal policy in an average sense, and we design an algorithm that
achieves this goal. Furthermore, when the agent is allowed to interact with the
target environment, we give a surprising result showing that asymptotically,
the improvement from pre-training is at most a constant factor. On the other
hand, in the non-asymptotic regime, we design an efficient algorithm and prove
a distribution-based regret bound in the target environment that is independent
of the state-action space.
Related papers
- Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts [0.15889427269227555]
We develop an adaptive re-training algorithm inspired by evolutionary game theory (EGT)
ERPO shows faster policy adaptation, higher average rewards, and reduced computational costs in policy adaptation.
arXiv Detail & Related papers (2024-10-22T09:29:53Z) - EvIL: Evolution Strategies for Generalisable Imitation Learning [33.745657379141676]
In imitation learning (IL) expert demonstrations and the environment we want to deploy our learned policy in aren't exactly the same.
Compared to policy-centric approaches to IL like cloning, reward-centric approaches like inverse reinforcement learning (IRL) often better replicate expert behaviour in new environments.
We find that modern deep IL algorithms frequently recover rewards which induce policies far weaker than the expert, even in the same environment the demonstrations were collected in.
We propose a novel evolution-strategies based method EvIL to optimise for a reward-shaping term that speeds up re-training in the target environment.
arXiv Detail & Related papers (2024-06-15T22:46:39Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Evolving Curricula with Regret-Based Environment Design [37.70275057075986]
We propose to harness the power of evolution in a principled, regret-based curriculum.
Our approach seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex.
arXiv Detail & Related papers (2022-03-02T18:40:00Z) - DisCo RL: Distribution-Conditioned Reinforcement Learning for
General-Purpose Policies [116.12670064963625]
We develop an off-policy algorithm called distribution-conditioned reinforcement learning (DisCo RL) to efficiently learn contextual policies.
We evaluate DisCo RL on a variety of robot manipulation tasks and find that it significantly outperforms prior methods on tasks that require generalization to new goal distributions.
arXiv Detail & Related papers (2021-04-23T16:51:58Z) - When Is Generalizable Reinforcement Learning Tractable? [74.87383727210705]
We study the query complexity required to train RL agents that can generalize to multiple environments.
We introduce Strong Proximity, a structural condition which precisely characterizes the relative closeness of different environments.
We show that under a natural weakening of this condition, RL can require query complexity that is exponential in the horizon to generalize.
arXiv Detail & Related papers (2021-01-01T19:08:24Z) - Instance based Generalization in Reinforcement Learning [24.485597364200824]
We analyze policy learning in the context of Partially Observable Markov Decision Processes (POMDPs)
We prove that, independently of the exploration strategy, reusing instances introduces significant changes on the effective Markov dynamics the agent observes during training.
We propose training a shared belief representation over an ensemble of specialized policies, from which we compute a consensus policy that is used for data collection, disallowing instance specific exploitation.
arXiv Detail & Related papers (2020-11-02T16:19:44Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.