SS-MAIL: Self-Supervised Multi-Agent Imitation Learning
- URL: http://arxiv.org/abs/2110.08963v1
- Date: Mon, 18 Oct 2021 01:17:50 GMT
- Title: SS-MAIL: Self-Supervised Multi-Agent Imitation Learning
- Authors: Akshay Dharmavaram, Tejus Gupta, Jiachen Li, Katia P. Sycara
- Abstract summary: Two families of algorithms - Behavioral Cloning (BC) and Adversarial Imitation Learning (AIL)
BC approaches suffer from compounding errors, as they ignore the sequential decision-making nature of the trajectory generation problem.
AIL methods are plagued with instability in their training dynamics.
We introduce a novel self-supervised loss that encourages the discriminator to approximate a richer reward function.
- Score: 18.283839252425803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The current landscape of multi-agent expert imitation is broadly dominated by
two families of algorithms - Behavioral Cloning (BC) and Adversarial Imitation
Learning (AIL). BC approaches suffer from compounding errors, as they ignore
the sequential decision-making nature of the trajectory generation problem.
Furthermore, they cannot effectively model multi-modal behaviors. While AIL
methods solve the issue of compounding errors and multi-modal policy training,
they are plagued with instability in their training dynamics. In this work, we
address this issue by introducing a novel self-supervised loss that encourages
the discriminator to approximate a richer reward function. We employ our method
to train a graph-based multi-agent actor-critic architecture that learns a
centralized policy, conditioned on a learned latent interaction graph. We show
that our method (SS-MAIL) outperforms prior state-of-the-art methods on
real-world prediction tasks, as well as on custom-designed synthetic
experiments. We prove that SS-MAIL is part of the family of AIL methods by
providing a theoretical connection to cost-regularized apprenticeship learning.
Moreover, we leverage the self-supervised formulation to introduce a novel
teacher forcing-based curriculum (Trajectory Forcing) that improves sample
efficiency by progressively increasing the length of the generated trajectory.
The SS-MAIL framework improves multi-agent imitation capabilities by
stabilizing the policy training, improving the reward shaping capabilities, as
well as providing the ability for modeling multi-modal trajectories.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents.
Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z) - HiMAP: Learning Heuristics-Informed Policies for Large-Scale Multi-Agent
Pathfinding [16.36594480478895]
Heuristics-Informed Multi-Agent Pathfinding (HiMAP)
Heuristics-Informed Multi-Agent Pathfinding (HiMAP)
arXiv Detail & Related papers (2024-02-23T13:01:13Z) - Learning and Calibrating Heterogeneous Bounded Rational Market Behaviour
with Multi-Agent Reinforcement Learning [4.40301653518681]
Agent-based models (ABMs) have shown promise for modelling various real world phenomena incompatible with traditional equilibrium analysis.
Recent developments in multi-agent reinforcement learning (MARL) offer a way to address this issue from a rationality perspective.
We propose a novel technique for representing heterogeneous processing-constrained agents within a MARL framework.
arXiv Detail & Related papers (2024-02-01T17:21:45Z) - Self-Supervised Reinforcement Learning that Transfers using Random
Features [41.00256493388967]
We propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards.
Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks.
arXiv Detail & Related papers (2023-05-26T20:37:06Z) - Relative Distributed Formation and Obstacle Avoidance with Multi-agent
Reinforcement Learning [20.401609420707867]
We propose a distributed formation and obstacle avoidance method based on multi-agent reinforcement learning (MARL)
Our method achieves better performance regarding formation error, formation convergence rate and on-par success rate of obstacle avoidance compared with baselines.
arXiv Detail & Related papers (2021-11-14T13:02:45Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.