Beyond Rewards: a Hierarchical Perspective on Offline Multiagent
Behavioral Analysis
- URL: http://arxiv.org/abs/2206.09046v1
- Date: Fri, 17 Jun 2022 23:07:33 GMT
- Title: Beyond Rewards: a Hierarchical Perspective on Offline Multiagent
Behavioral Analysis
- Authors: Shayegan Omidshafiei, Andrei Kapishnikov, Yannick Assogba, Lucas
Dixon, Been Kim
- Abstract summary: We introduce a model-agnostic method for discovery of behavior clusters in multiagent domains.
Our framework makes no assumption about agents' underlying learning algorithms, does not require access to their latent states or models, and can be trained using entirely offline observational data.
- Score: 14.656957226255628
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Each year, expert-level performance is attained in increasingly-complex
multiagent domains, notable examples including Go, Poker, and StarCraft II.
This rapid progression is accompanied by a commensurate need to better
understand how such agents attain this performance, to enable their safe
deployment, identify limitations, and reveal potential means of improving them.
In this paper we take a step back from performance-focused multiagent learning,
and instead turn our attention towards agent behavior analysis. We introduce a
model-agnostic method for discovery of behavior clusters in multiagent domains,
using variational inference to learn a hierarchy of behaviors at the joint and
local agent levels. Our framework makes no assumption about agents' underlying
learning algorithms, does not require access to their latent states or models,
and can be trained using entirely offline observational data. We illustrate the
effectiveness of our method for enabling the coupled understanding of behaviors
at the joint and local agent level, detection of behavior changepoints
throughout training, discovery of core behavioral concepts (e.g., those that
facilitate higher returns), and demonstrate the approach's scalability to a
high-dimensional multiagent MuJoCo control domain.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - On Diagnostics for Understanding Agent Training Behaviour in Cooperative
MARL [5.124364759305485]
We argue that relying solely on the empirical returns may obscure crucial insights into agent behaviour.
In this paper, we explore the application of explainable AI (XAI) tools to gain profound insights into agent behaviour.
arXiv Detail & Related papers (2023-12-13T19:10:10Z) - DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement
Learning [84.22561239481901]
We propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents.
We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement.
arXiv Detail & Related papers (2023-12-10T06:03:57Z) - MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes.
This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people.
We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z) - Emergent Behaviors in Multi-Agent Target Acquisition [0.0]
We simulate a Multi-Agent System (MAS) using Reinforcement Learning (RL) in a pursuit-evasion game.
We create different adversarial scenarios by replacing RL-trained pursuers' policies with two distinct (non-RL) analytical strategies.
The novelty of our approach entails the creation of an influential feature set that reveals underlying data regularities.
arXiv Detail & Related papers (2022-12-15T15:20:58Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - Explaining Reinforcement Learning Policies through Counterfactual
Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time.
Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution.
In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.