AutoDIME: Automatic Design of Interesting Multi-Agent Environments
- URL: http://arxiv.org/abs/2203.02481v1
- Date: Fri, 4 Mar 2022 18:25:33 GMT
- Title: AutoDIME: Automatic Design of Interesting Multi-Agent Environments
- Authors: Ingmar Kanitscheider and Harri Edwards
- Abstract summary: We examine a set of intrinsic teacher rewards derived from prediction problems that can be applied in multi-agent settings.
Of the intrinsic rewards considered we found value disagreement to be most consistent across tasks.
Our results suggest that intrinsic teacher rewards, and in particular value disagreement, are a promising approach for automating both single and multi-agent environment design.
- Score: 3.1546318469750205
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Designing a distribution of environments in which RL agents can learn
interesting and useful skills is a challenging and poorly understood task, for
multi-agent environments the difficulties are only exacerbated. One approach is
to train a second RL agent, called a teacher, who samples environments that are
conducive for the learning of student agents. However, most previous proposals
for teacher rewards do not generalize straightforwardly to the multi-agent
setting. We examine a set of intrinsic teacher rewards derived from prediction
problems that can be applied in multi-agent settings and evaluate them in
Mujoco tasks such as multi-agent Hide and Seek as well as a diagnostic
single-agent maze task. Of the intrinsic rewards considered we found value
disagreement to be most consistent across tasks, leading to faster and more
reliable emergence of advanced skills in Hide and Seek and the maze task.
Another candidate intrinsic reward considered, value prediction error, also
worked well in Hide and Seek but was susceptible to noisy-TV style distractions
in stochastic environments. Policy disagreement performed well in the maze task
but did not speed up learning in Hide and Seek. Our results suggest that
intrinsic teacher rewards, and in particular value disagreement, are a
promising approach for automating both single and multi-agent environment
design.
Related papers
- No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks.
This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics.
We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z) - Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions [68.92637077909693]
This paper investigates the faithfulness of multimodal large language model (MLLM) agents in the graphical user interface (GUI) environment.
A general setting is proposed where both the user and the agent are benign, and the environment, while not malicious, contains unrelated content.
Experimental results reveal that even the most powerful models, whether generalist agents or specialist GUI agents, are susceptible to distractions.
arXiv Detail & Related papers (2024-08-05T15:16:22Z) - Self-supervised network distillation: an effective approach to exploration in sparse reward environments [0.0]
Reinforcement learning can train an agent to behave in an environment according to a predesigned reward function.
The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration.
We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator.
arXiv Detail & Related papers (2023-02-22T18:58:09Z) - The StarCraft Multi-Agent Challenges+ : Learning of Multi-Stage Tasks
and Environmental Factors without Precise Reward Functions [14.399479538886064]
We propose a novel benchmark called the StarCraft Multi-Agent Challenges+.
This challenge is interested in the exploration capability of MARL algorithms to efficiently learn implicit multi-stage tasks and environmental factors as well as micro-control.
We investigate MARL algorithms under SMAC+ and observe that recent approaches work well in similar settings to the previous challenges, but misbehave in offensive scenarios.
arXiv Detail & Related papers (2022-07-05T12:43:54Z) - Open-Ended Learning Leads to Generally Capable Agents [12.079718607356178]
We define a universe of tasks within an environment domain and demonstrate the ability to train agents that are capable across this vast space and beyond.
The resulting space is exceptionally diverse in terms of the challenges posed to agents, and as such, even measuring the learning progress of an agent is an open research problem.
We show that through constructing an open-ended learning process, which dynamically changes the training task distributions and training objectives such that the agent never stops learning, we achieve consistent learning of new behaviours.
arXiv Detail & Related papers (2021-07-27T13:30:07Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Discovery of Options via Meta-Learned Subgoals [59.2160583043938]
Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster.
We introduce a novel meta-gradient approach for discovering useful options in multi-task RL environments.
arXiv Detail & Related papers (2021-02-12T19:50:40Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z) - RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated
Environments [15.736899098702972]
We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation.
We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid.
arXiv Detail & Related papers (2020-02-27T18:03:16Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z) - Individual specialization in multi-task environments with multiagent
reinforcement learners [0.0]
There is a growing interest in Multi-Agent Reinforcement Learning (MARL) as the first steps towards building general intelligent agents.
Previous results point us towards increased conditions for coordination, efficiency/fairness, and common-pool resource sharing.
We further study coordination in multi-task environments where several rewarding tasks can be performed and thus agents don't necessarily need to perform well in all tasks, but under certain conditions may specialize.
arXiv Detail & Related papers (2019-12-29T15:20:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.