REMAX: Relational Representation for Multi-Agent Exploration
- URL: http://arxiv.org/abs/2008.05214v2
- Date: Sat, 5 Feb 2022 06:03:08 GMT
- Title: REMAX: Relational Representation for Multi-Agent Exploration
- Authors: Heechang Ryu, Hayong Shin, Jinkyoo Park
- Abstract summary: We propose a learning-based exploration strategy to generate the initial states of a game.
We demonstrate that our method improves the training and performance of the MARL model more than the existing exploration methods.
- Score: 13.363887960136102
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training a multi-agent reinforcement learning (MARL) model with a sparse
reward is generally difficult because numerous combinations of interactions
among agents induce a certain outcome (i.e., success or failure). Earlier
studies have tried to resolve this issue by employing an intrinsic reward to
induce interactions that are helpful for learning an effective policy. However,
this approach requires extensive prior knowledge for designing an intrinsic
reward. To train the MARL model effectively without designing the intrinsic
reward, we propose a learning-based exploration strategy to generate the
initial states of a game. The proposed method adopts a variational graph
autoencoder to represent a game state such that (1) the state can be compactly
encoded to a latent representation by considering relationships among agents,
and (2) the latent representation can be used as an effective input for a
coupled surrogate model to predict an exploration score. The proposed method
then finds new latent representations that maximize the exploration scores and
decodes these representations to generate initial states from which the MARL
model starts training in the game and thus experiences novel and rewardable
states. We demonstrate that our method improves the training and performance of
the MARL model more than the existing exploration methods.
Related papers
- Imagine, Initialize, and Explore: An Effective Exploration Method in
Multi-Agent Reinforcement Learning [27.81925751697255]
We propose a novel method for efficient multi-agent exploration in complex scenarios.
We formulate the imagination as a sequence modeling problem, where the states, observations, prompts, actions, and rewards are predicted autoregressively.
By initializing agents at the critical states, IIE significantly increases the likelihood of discovering potentially important underexplored regions.
arXiv Detail & Related papers (2024-02-28T01:45:01Z) - MA2CL:Masked Attentive Contrastive Learning for Multi-Agent
Reinforcement Learning [128.19212716007794]
We propose an effective framework called textbfMulti-textbfAgent textbfMasked textbfAttentive textbfContrastive textbfLearning (MA2CL)
MA2CL encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space.
Our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
arXiv Detail & Related papers (2023-06-03T05:32:19Z) - MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes.
This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people.
We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z) - Embedding Contextual Information through Reward Shaping in Multi-Agent
Learning: A Case Study from Google Football [0.0]
We create a novel reward shaping method by embedding contextual information in reward function.
We demonstrate this in the Google Research Football (GRF) environment.
Experiment results prove that our reward shaping method is a useful addition to state-of-the-art MARL algorithms for training agents in environments with sparse reward signal.
arXiv Detail & Related papers (2023-03-25T10:21:13Z) - Strangeness-driven Exploration in Multi-Agent Reinforcement Learning [0.0]
We introduce a new exploration method with the strangeness that can be easily incorporated into any centralized training and decentralized execution (CTDE)-based MARL algorithms.
The exploration bonus is obtained from the strangeness and the proposed exploration method is not much affected by transitions commonly observed in MARL tasks.
arXiv Detail & Related papers (2022-12-27T11:08:49Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Imaginary Hindsight Experience Replay: Curious Model-based Learning for
Sparse Reward Tasks [9.078290260836706]
We propose a model-based method tailored for sparse-reward tasks that foregoes the need for complicated reward engineering.
This approach, termed Imaginary Hindsight Experience Replay, minimises real-world interactions by incorporating imaginary data into policy updates.
Upon evaluation, this approach provides an order of magnitude increase in data-efficiency on average versus the state-of-the-art model-free method in the benchmark OpenAI Gym Fetch Robotics tasks.
arXiv Detail & Related papers (2021-10-05T23:38:31Z) - Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel.
On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations.
On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Model-free Representation Learning and Exploration in Low-rank MDPs [64.72023662543363]
We present the first model-free representation learning algorithms for low rank MDPs.
Key algorithmic contribution is a new minimax representation learning objective.
Result can accommodate general function approximation to scale to complex environments.
arXiv Detail & Related papers (2021-02-14T00:06:54Z) - A New Framework for Query Efficient Active Imitation Learning [5.167794607251493]
There is a human expert knowing the rewards and unsafe states based on his preference and objective, but querying that human expert is expensive.
We propose a new framework for imitation learning (IL) algorithm that actively and interactively learns a model of the user's reward function with efficient queries.
We evaluate the proposed method with simulated human on a state-based 2D navigation task, robotic control tasks and the image-based video games.
arXiv Detail & Related papers (2019-12-30T18:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.