Learning How to Infer Partial MDPs for In-Context Adaptation and
Exploration
- URL: http://arxiv.org/abs/2302.04250v2
- Date: Thu, 4 May 2023 14:37:36 GMT
- Title: Learning How to Infer Partial MDPs for In-Context Adaptation and
Exploration
- Authors: Chentian Jiang, Nan Rosemary Ke, Hado van Hasselt
- Abstract summary: Posterior sampling is a promising approach, but it requires Bayesian inference and dynamic programming.
We show that even though partial models exclude relevant information from the environment, they can nevertheless lead to good policies.
- Score: 17.27164535440641
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To generalize across tasks, an agent should acquire knowledge from past tasks
that facilitate adaptation and exploration in future tasks. We focus on the
problem of in-context adaptation and exploration, where an agent only relies on
context, i.e., history of states, actions and/or rewards, rather than
gradient-based updates. Posterior sampling (extension of Thompson sampling) is
a promising approach, but it requires Bayesian inference and dynamic
programming, which often involve unknowns (e.g., a prior) and costly
computations. To address these difficulties, we use a transformer to learn an
inference process from training tasks and consider a hypothesis space of
partial models, represented as small Markov decision processes that are cheap
for dynamic programming. In our version of the Symbolic Alchemy benchmark, our
method's adaptation speed and exploration-exploitation balance approach those
of an exact posterior sampling oracle. We also show that even though partial
models exclude relevant information from the environment, they can nevertheless
lead to good policies.
Related papers
- Towards Adapting Reinforcement Learning Agents to New Tasks: Insights from Q-Values [8.694989771294013]
Policy gradient methods can still be useful in many domains as long as we can wrangle with how to exploit them in a sample efficient way.
We explore the chaotic nature of DQNs in reinforcement learning, while understanding how the information that they retain when trained can be repurposed for adapting a model to different tasks.
arXiv Detail & Related papers (2024-07-14T21:28:27Z) - Investigating the role of model-based learning in exploration and
transfer [11.652741003589027]
In this paper, we investigate transfer learning in the context of model-based agents.
We find that a model-based approach outperforms controlled model-free baselines for transfer learning.
Our results show that intrinsic exploration combined with environment models present a viable direction towards agents that are self-supervised and able to generalize to novel reward functions.
arXiv Detail & Related papers (2023-02-08T11:49:58Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - Improving Meta-learning for Low-resource Text Classification and
Generation via Memory Imitation [87.98063273826702]
We propose a memory imitation meta-learning (MemIML) method that enhances the model's reliance on support sets for task adaptation.
A theoretical analysis is provided to prove the effectiveness of our method.
arXiv Detail & Related papers (2022-03-22T12:41:55Z) - Learning Neural Models for Natural Language Processing in the Face of
Distributional Shift [10.990447273771592]
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications.
It builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time.
This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information.
It is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime
arXiv Detail & Related papers (2021-09-03T14:29:20Z) - Meta-Reinforcement Learning by Tracking Task Non-stationarity [45.90345116853823]
We propose a novel algorithm (TRIO) that optimize for the future by explicitly tracking the task evolution through time.
Unlike most existing methods, TRIO does not assume Markovian task-evolution processes.
We evaluate our algorithm on different simulated problems and show it outperforms competitive baselines.
arXiv Detail & Related papers (2021-05-18T21:19:41Z) - Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time.
We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z) - Sequential Transfer in Reinforcement Learning with a Generative Model [48.40219742217783]
We show how to reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones.
We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge.
We empirically verify our theoretical findings in simple simulated domains.
arXiv Detail & Related papers (2020-07-01T19:53:35Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - Unshuffling Data for Improved Generalization [65.57124325257409]
Generalization beyond the training distribution is a core challenge in machine learning.
We show that partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple training environments can guide the learning of models with better out-of-distribution generalization.
arXiv Detail & Related papers (2020-02-27T03:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.