Learning How to Infer Partial MDPs for In-Context Adaptation and
Exploration
- URL: http://arxiv.org/abs/2302.04250v2
- Date: Thu, 4 May 2023 14:37:36 GMT
- Title: Learning How to Infer Partial MDPs for In-Context Adaptation and
Exploration
- Authors: Chentian Jiang, Nan Rosemary Ke, Hado van Hasselt
- Abstract summary: Posterior sampling is a promising approach, but it requires Bayesian inference and dynamic programming.
We show that even though partial models exclude relevant information from the environment, they can nevertheless lead to good policies.
- Score: 17.27164535440641
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To generalize across tasks, an agent should acquire knowledge from past tasks
that facilitate adaptation and exploration in future tasks. We focus on the
problem of in-context adaptation and exploration, where an agent only relies on
context, i.e., history of states, actions and/or rewards, rather than
gradient-based updates. Posterior sampling (extension of Thompson sampling) is
a promising approach, but it requires Bayesian inference and dynamic
programming, which often involve unknowns (e.g., a prior) and costly
computations. To address these difficulties, we use a transformer to learn an
inference process from training tasks and consider a hypothesis space of
partial models, represented as small Markov decision processes that are cheap
for dynamic programming. In our version of the Symbolic Alchemy benchmark, our
method's adaptation speed and exploration-exploitation balance approach those
of an exact posterior sampling oracle. We also show that even though partial
models exclude relevant information from the environment, they can nevertheless
lead to good policies.
Related papers
- Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.
We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.
We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z) - Task Vectors in In-Context Learning: Emergence, Formation, and Benefit [17.72043522825441]
We investigate the formation of task vectors in a controlled setting using models trained from scratch on synthetic datasets.
Our findings confirm that task vectors naturally emerge under certain conditions, but the tasks may be relatively weakly and/or non-locally encoded within the model.
To promote strong task vectors encoded at a prescribed location within the model, we propose an auxiliary training mechanism based on a task vector prompting loss.
arXiv Detail & Related papers (2025-01-16T01:54:23Z) - Towards Adapting Reinforcement Learning Agents to New Tasks: Insights from Q-Values [8.694989771294013]
Policy gradient methods can still be useful in many domains as long as we can wrangle with how to exploit them in a sample efficient way.
We explore the chaotic nature of DQNs in reinforcement learning, while understanding how the information that they retain when trained can be repurposed for adapting a model to different tasks.
arXiv Detail & Related papers (2024-07-14T21:28:27Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - Improving Meta-learning for Low-resource Text Classification and
Generation via Memory Imitation [87.98063273826702]
We propose a memory imitation meta-learning (MemIML) method that enhances the model's reliance on support sets for task adaptation.
A theoretical analysis is provided to prove the effectiveness of our method.
arXiv Detail & Related papers (2022-03-22T12:41:55Z) - Meta-Reinforcement Learning by Tracking Task Non-stationarity [45.90345116853823]
We propose a novel algorithm (TRIO) that optimize for the future by explicitly tracking the task evolution through time.
Unlike most existing methods, TRIO does not assume Markovian task-evolution processes.
We evaluate our algorithm on different simulated problems and show it outperforms competitive baselines.
arXiv Detail & Related papers (2021-05-18T21:19:41Z) - Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time.
We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z) - Sequential Transfer in Reinforcement Learning with a Generative Model [48.40219742217783]
We show how to reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones.
We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge.
We empirically verify our theoretical findings in simple simulated domains.
arXiv Detail & Related papers (2020-07-01T19:53:35Z) - Unshuffling Data for Improved Generalization [65.57124325257409]
Generalization beyond the training distribution is a core challenge in machine learning.
We show that partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple training environments can guide the learning of models with better out-of-distribution generalization.
arXiv Detail & Related papers (2020-02-27T03:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.