When Is Generalizable Reinforcement Learning Tractable?
- URL: http://arxiv.org/abs/2101.00300v1
- Date: Fri, 1 Jan 2021 19:08:24 GMT
- Title: When Is Generalizable Reinforcement Learning Tractable?
- Authors: Dhruv Malik, Yuanzhi Li, Pradeep Ravikumar
- Abstract summary: We study the query complexity required to train RL agents that can generalize to multiple environments.
We introduce Strong Proximity, a structural condition which precisely characterizes the relative closeness of different environments.
We show that under a natural weakening of this condition, RL can require query complexity that is exponential in the horizon to generalize.
- Score: 74.87383727210705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Agents trained by reinforcement learning (RL) often fail to generalize beyond
the environment they were trained in, even when presented with new scenarios
that seem very similar to the training environment. We study the query
complexity required to train RL agents that can generalize to multiple
environments. Intuitively, tractable generalization is only possible when the
environments are similar or close in some sense. To capture this, we introduce
Strong Proximity, a structural condition which precisely characterizes the
relative closeness of different environments. We provide an algorithm which
exploits Strong Proximity to provably and efficiently generalize. We also show
that under a natural weakening of this condition, which we call Weak Proximity,
RL can require query complexity that is exponential in the horizon to
generalize. A key consequence of our theory is that even when the environments
share optimal trajectories, and have highly similar reward and transition
functions (as measured by classical metrics), tractable generalization is
impossible.
Related papers
- Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations [22.6449779859417]
General intelligence requires quick adaption across tasks.
In this paper, we explore a wider range of scenarios where not only the distribution but also the environment spaces may change.
We introduce a causality-guided self-adaptive representation-based approach, called CSR, that equips the agent to generalize effectively.
arXiv Detail & Related papers (2024-07-30T08:48:49Z) - Can Learned Optimization Make Reinforcement Learning Less Difficult? [70.5036361852812]
We consider whether learned optimization can help overcome reinforcement learning difficulties.
Our method, Learned Optimization for Plasticity, Exploration and Non-stationarity (OPEN), meta-learns an update rule whose input features and output structure are informed by previously proposed to these difficulties.
arXiv Detail & Related papers (2024-07-09T17:55:23Z) - Learning Curricula in Open-Ended Worlds [17.138779075998084]
This thesis develops a class of methods called Unsupervised Environment Design (UED)
Given an environment design space, UED automatically generates an infinite sequence or curriculum of training environments.
The findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness.
arXiv Detail & Related papers (2023-12-03T16:44:00Z) - Zipfian environments for Reinforcement Learning [19.309119596790563]
We show that learning robustly from skewed experience is a critical challenge for applying Deep RL methods beyond simulations or laboratories.
We develop three complementary RL environments where the agent's experience varies according to a Zipfian (discrete power law) distribution.
Our results show that learning robustly from skewed experience is a critical challenge for applying Deep RL methods beyond simulations or laboratories.
arXiv Detail & Related papers (2022-03-15T19:59:10Z) - Contextualize Me -- The Case for Context in Reinforcement Learning [49.794253971446416]
Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner.
We show how cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks.
arXiv Detail & Related papers (2022-02-09T15:01:59Z) - Provable Hierarchy-Based Meta-Reinforcement Learning [50.17896588738377]
We analyze HRL in the meta-RL setting, where learner learns latent hierarchical structure during meta-training for use in a downstream task.
We provide "diversity conditions" which, together with a tractable optimism-based algorithm, guarantee sample-efficient recovery of this natural hierarchy.
Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.
arXiv Detail & Related papers (2021-10-18T17:56:02Z) - Stochastic Training is Not Necessary for Generalization [57.04880404584737]
It is widely believed that the implicit regularization of gradient descent (SGD) is fundamental to the impressive generalization behavior we observe in neural networks.
In this work, we demonstrate that non-stochastic full-batch training can achieve strong performance on CIFAR-10 that is on-par with SGD.
arXiv Detail & Related papers (2021-09-29T00:50:00Z) - Measuring Generalization with Optimal Transport [111.29415509046886]
We develop margin-based generalization bounds, where the margins are normalized with optimal transport costs.
Our bounds robustly predict the generalization error, given training data and network parameters, on large scale datasets.
arXiv Detail & Related papers (2021-06-07T03:04:59Z) - How Transferable are the Representations Learned by Deep Q Agents? [13.740174266824532]
We consider the source of Deep Reinforcement Learning's sample complexity.
We compare the benefits of transfer learning to learning a policy from scratch.
We find that benefits due to transfer are highly variable in general and non-symmetric across pairs of tasks.
arXiv Detail & Related papers (2020-02-24T00:23:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.