Continuous Coordination As a Realistic Scenario for Lifelong Learning
- URL: http://arxiv.org/abs/2103.03216v1
- Date: Thu, 4 Mar 2021 18:44:03 GMT
- Title: Continuous Coordination As a Realistic Scenario for Lifelong Learning
- Authors: Hadi Nekoei, Akilesh Badrinaaraayanan, Aaron Courville, Sarath Chandar
- Abstract summary: We introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings.
We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation.
We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works.
- Score: 6.044372319762058
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Current deep reinforcement learning (RL) algorithms are still highly
task-specific and lack the ability to generalize to new environments. Lifelong
learning (LLL), however, aims at solving multiple tasks sequentially by
efficiently transferring and using knowledge between tasks. Despite a surge of
interest in lifelong RL in recent years, the lack of a realistic testbed makes
robust evaluation of LLL algorithms difficult. Multi-agent RL (MARL), on the
other hand, can be seen as a natural scenario for lifelong RL due to its
inherent non-stationarity, since the agents' policies change over time. In this
work, we introduce a multi-agent lifelong learning testbed that supports both
zero-shot and few-shot settings. Our setup is based on Hanabi -- a
partially-observable, fully cooperative multi-agent game that has been shown to
be challenging for zero-shot coordination. Its large strategy space makes it a
desirable environment for lifelong RL tasks. We evaluate several recent MARL
methods, and benchmark state-of-the-art LLL algorithms in limited memory and
computation regimes to shed light on their strengths and weaknesses. This
continual learning paradigm also provides us with a pragmatic way of going
beyond centralized training which is the most commonly used training protocol
in MARL. We empirically show that the agents trained in our setup are able to
coordinate well with unseen agents, without any additional assumptions made by
previous works.
Related papers
- Statistical Guarantees for Lifelong Reinforcement Learning using PAC-Bayesian Theory [37.02104729448692]
EPIC is a novel algorithm designed for lifelong reinforcement learning.
It learns a shared policy distribution, referred to as the textitworld policy, which enables rapid adaptation to new tasks.
Experiments on a variety of environments demonstrate that EPIC significantly outperforms existing methods in lifelong RL.
arXiv Detail & Related papers (2024-11-01T07:01:28Z) - ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients.
Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty.
In this work, we define a robust MRL objective with a controlled level.
The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Meta Reinforcement Learning with Successor Feature Based Context [51.35452583759734]
We propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms.
Our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training.
arXiv Detail & Related papers (2022-07-29T14:52:47Z) - Learning Progress Driven Multi-Agent Curriculum [18.239527837186216]
Curriculum reinforcement learning aims to speed up learning by gradually increasing the difficulty of a task.
We propose self-paced MARL (SPMARL) to prioritize tasks based on textitlearning progress instead of the episode return.
arXiv Detail & Related papers (2022-05-20T08:16:30Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Decentralized Cooperative Multi-Agent Reinforcement Learning with
Exploration [35.75029940279768]
We study multi-agent reinforcement learning in the most basic cooperative setting -- Markov teams.
We propose an algorithm in which each agent independently runs a stage-based V-learning style algorithm.
We show that the agents can learn an $epsilon$-approximate Nash equilibrium policy in at most $proptowidetildeO (1/epsilon4)$ episodes.
arXiv Detail & Related papers (2021-10-12T02:45:12Z) - REIN-2: Giving Birth to Prepared Reinforcement Learning Agents Using
Reinforcement Learning Agents [0.0]
In this paper, we introduce a meta-learning scheme that shifts the objective of learning to solve a task into the objective of learning to learn to solve a task (or a set of tasks)
Our model, named REIN-2, is a meta-learning scheme formulated within the RL framework, the goal of which is to develop a meta-RL agent that learns how to produce other RL agents.
Compared to traditional state-of-the-art Deep RL algorithms, experimental results show remarkable performance of our model in popular OpenAI Gym environments.
arXiv Detail & Related papers (2021-10-11T10:13:49Z) - Robust Reinforcement Learning via Adversarial training with Langevin
Dynamics [51.234482917047835]
We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents.
We present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy method.
arXiv Detail & Related papers (2020-02-14T14:59:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.