Deep Interactive Bayesian Reinforcement Learning via Meta-Learning
- URL: http://arxiv.org/abs/2101.03864v1
- Date: Mon, 11 Jan 2021 13:25:13 GMT
- Title: Deep Interactive Bayesian Reinforcement Learning via Meta-Learning
- Authors: Luisa Zintgraf, Sam Devlin, Kamil Ciosek, Shimon Whiteson, Katja
Hofmann
- Abstract summary: The optimal adaptive behaviour under uncertainty over the other agents' strategies can be computed using the Interactive Bayesian Reinforcement Learning framework.
We propose to meta-learn approximate belief inference and Bayes-optimal behaviour for a given prior.
We show empirically that our approach outperforms existing methods that use a model-free approach, sample from the approximate posterior, maintain memory-free models of others, or do not fully utilise the known structure of the environment.
- Score: 63.96201773395921
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Agents that interact with other agents often do not know a priori what the
other agents' strategies are, but have to maximise their own online return
while interacting with and learning about others. The optimal adaptive
behaviour under uncertainty over the other agents' strategies w.r.t. some prior
can in principle be computed using the Interactive Bayesian Reinforcement
Learning framework. Unfortunately, doing so is intractable in most settings,
and existing approximation methods are restricted to small tasks. To overcome
this, we propose to meta-learn approximate belief inference and Bayes-optimal
behaviour for a given prior. To model beliefs over other agents, we combine
sequential and hierarchical Variational Auto-Encoders, and meta-train this
inference model alongside the policy. We show empirically that our approach
outperforms existing methods that use a model-free approach, sample from the
approximate posterior, maintain memory-free models of others, or do not fully
utilise the known structure of the environment.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Learning and Calibrating Heterogeneous Bounded Rational Market Behaviour
with Multi-Agent Reinforcement Learning [4.40301653518681]
Agent-based models (ABMs) have shown promise for modelling various real world phenomena incompatible with traditional equilibrium analysis.
Recent developments in multi-agent reinforcement learning (MARL) offer a way to address this issue from a rationality perspective.
We propose a novel technique for representing heterogeneous processing-constrained agents within a MARL framework.
arXiv Detail & Related papers (2024-02-01T17:21:45Z) - MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes.
This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people.
We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z) - Concept Learning for Interpretable Multi-Agent Reinforcement Learning [5.179808182296037]
We introduce a method for incorporating interpretable concepts from a domain expert into models trained through multi-agent reinforcement learning.
This allows an expert to both reason about the resulting concept policy models in terms of these high-level concepts at run-time, as well as intervene and correct mispredictions to improve performance.
We show that this yields improved interpretability and training stability, with benefits to policy performance and sample efficiency in a simulated and real-world cooperative-competitive multi-agent game.
arXiv Detail & Related papers (2023-02-23T18:53:09Z) - Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions.
In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems.
Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - A Policy Gradient Algorithm for Learning to Learn in Multiagent
Reinforcement Learning [47.154539984501895]
We propose a novel meta-multiagent policy gradient theorem that accounts for the non-stationary policy dynamics inherent to multiagent learning settings.
This is achieved by modeling our gradient updates to consider both an agent's own non-stationary policy dynamics and the non-stationary policy dynamics of other agents in the environment.
arXiv Detail & Related papers (2020-10-31T22:50:21Z) - Learning to Model Opponent Learning [11.61673411387596]
Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment.
This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment.
We develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL)
arXiv Detail & Related papers (2020-06-06T17:19:04Z) - Variational Autoencoders for Opponent Modeling in Multi-Agent Systems [9.405879323049659]
Multi-agent systems exhibit complex behaviors that emanate from the interactions of multiple agents in a shared environment.
In this work, we are interested in controlling one agent in a multi-agent system and successfully learn to interact with the other agents that have fixed policies.
Modeling the behavior of other agents (opponents) is essential in understanding the interactions of the agents in the system.
arXiv Detail & Related papers (2020-01-29T13:38:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.