Meta-trained agents implement Bayes-optimal agents
- URL: http://arxiv.org/abs/2010.11223v1
- Date: Wed, 21 Oct 2020 18:05:21 GMT
- Title: Meta-trained agents implement Bayes-optimal agents
- Authors: Vladimir Mikulik, Gr\'egoire Del\'etang, Tom McGrath, Tim Genewein,
Miljan Martic, Shane Legg, Pedro A. Ortega
- Abstract summary: We show that memory-based meta-learning might serve as a technique for numerically approximating Bayes-optimal agents.
Our results suggest that memory-based meta-learning might serve as a general technique for numerically approximating Bayes-optimal agents.
- Score: 13.572630988699572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Memory-based meta-learning is a powerful technique to build agents that adapt
fast to any task within a target distribution. A previous theoretical study has
argued that this remarkable performance is because the meta-training protocol
incentivises agents to behave Bayes-optimally. We empirically investigate this
claim on a number of prediction and bandit tasks. Inspired by ideas from
theoretical computer science, we show that meta-learned and Bayes-optimal
agents not only behave alike, but they even share a similar computational
structure, in the sense that one agent system can approximately simulate the
other. Furthermore, we show that Bayes-optimal agents are fixed points of the
meta-learning dynamics. Our results suggest that memory-based meta-learning
might serve as a general technique for numerically approximating Bayes-optimal
agents - that is, even for task distributions for which we currently don't
possess tractable models.
Related papers
- AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent [57.10083973844841]
AgentArk is a novel framework to distill multi-agent dynamics into the weights of a single model.<n>We investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios.<n>By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents.
arXiv Detail & Related papers (2026-02-03T19:18:28Z) - Predictive Coding Enhances Meta-RL To Achieve Interpretable Bayes-Optimal Belief Representation Under Partial Observability [10.548824172738227]
Learning a compact representation of history is critical for planning and generalization in partially observable environments.<n>We show that meta-reinforcement learning (RL) agents can attain near Bayes-optimal policies, but often fail to learn the compact, interpretable Bayes-optimal belief states.<n>We investigate whether integrating self-supervised predictive coding modules into meta-RL can facilitate learning of Bayes-optimal representations.
arXiv Detail & Related papers (2025-10-24T21:45:56Z) - ContraBAR: Contrastive Bayes-Adaptive Deep RL [22.649531458557206]
In meta reinforcement learning (meta RL), an agent seeks a Bayes-optimal policy -- the optimal policy when facing an unknown task.
We investigate whether contrastive methods can be used for learning Bayes-optimal behavior.
We propose a simple meta RL algorithm that uses contrastive predictive coding (CPC) in lieu of variational belief inference.
arXiv Detail & Related papers (2023-06-04T17:50:20Z) - MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes.
This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people.
We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z) - Bayesian Meta-Learning Through Variational Gaussian Processes [0.0]
We extend Gaussian-process-based meta-learning to allow for high-quality, arbitrary non-Gaussian uncertainty predictions.
Our method performs significantly better than existing Bayesian meta-learning baselines.
arXiv Detail & Related papers (2021-10-21T10:44:23Z) - Energy-Efficient and Federated Meta-Learning via Projected Stochastic
Gradient Ascent [79.58680275615752]
We propose an energy-efficient federated meta-learning framework.
We assume each task is owned by a separate agent, so a limited number of tasks is used to train a meta-model.
arXiv Detail & Related papers (2021-05-31T08:15:44Z) - What is Going on Inside Recurrent Meta Reinforcement Learning Agents? [63.58053355357644]
Recurrent meta reinforcement learning (meta-RL) agents are agents that employ a recurrent neural network (RNN) for the purpose of "learning a learning algorithm"
We shed light on the internal working mechanisms of these agents by reformulating the meta-RL problem using the Partially Observable Markov Decision Process (POMDP) framework.
arXiv Detail & Related papers (2021-04-29T20:34:39Z) - Deep Interactive Bayesian Reinforcement Learning via Meta-Learning [63.96201773395921]
The optimal adaptive behaviour under uncertainty over the other agents' strategies can be computed using the Interactive Bayesian Reinforcement Learning framework.
We propose to meta-learn approximate belief inference and Bayes-optimal behaviour for a given prior.
We show empirically that our approach outperforms existing methods that use a model-free approach, sample from the approximate posterior, maintain memory-free models of others, or do not fully utilise the known structure of the environment.
arXiv Detail & Related papers (2021-01-11T13:25:13Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.