MAMBPO: Sample-efficient multi-robot reinforcement learning using
learned world models
- URL: http://arxiv.org/abs/2103.03662v1
- Date: Fri, 5 Mar 2021 13:37:23 GMT
- Title: MAMBPO: Sample-efficient multi-robot reinforcement learning using
learned world models
- Authors: Dani\"el Willemsen, Mario Coppola and Guido C.H.E. de Croon
- Abstract summary: Multi-robot systems can benefit from reinforcement learning (RL) algorithms that learn behaviours in a small number of trials.
We present a novel multi-agent model-based RL algorithm: Multi-Agent Model-Based Policy Optimization (MAMBPO)
- Score: 4.84279798426797
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-robot systems can benefit from reinforcement learning (RL) algorithms
that learn behaviours in a small number of trials, a property known as sample
efficiency. This research thus investigates the use of learned world models to
improve sample efficiency. We present a novel multi-agent model-based RL
algorithm: Multi-Agent Model-Based Policy Optimization (MAMBPO), utilizing the
Centralized Learning for Decentralized Execution (CLDE) framework. CLDE
algorithms allow a group of agents to act in a fully decentralized manner after
training. This is a desirable property for many systems comprising of multiple
robots. MAMBPO uses a learned world model to improve sample efficiency compared
to model-free Multi-Agent Soft Actor-Critic (MASAC). We demonstrate this on two
simulated multi-robot tasks, where MAMBPO achieves a similar performance to
MASAC, but requires far fewer samples to do so. Through this, we take an
important step towards making real-life learning for multi-robot systems
possible.
Related papers
- Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models [106.94827590977337]
We propose a novel world model for Multi-Agent RL (MARL) that learns decentralized local dynamics for scalability.
We also introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation.
Results on Starcraft Multi-Agent Challenge (SMAC) show that it outperforms strong model-free approaches and existing model-based methods in both sample efficiency and overall performance.
arXiv Detail & Related papers (2024-06-22T12:40:03Z) - Efficient Multi-agent Reinforcement Learning by Planning [33.51282615335009]
Multi-agent reinforcement learning (MARL) algorithms have accomplished remarkable breakthroughs in solving large-scale decision-making tasks.
Most existing MARL algorithms are model-free, limiting sample efficiency and hindering their applicability in more challenging scenarios.
We propose the MAZero algorithm, which combines a centralized model with Monte Carlo Tree Search (MCTS) for policy search.
arXiv Detail & Related papers (2024-05-20T04:36:02Z) - Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation [8.940998315746684]
We propose a model-based reinforcement learning (RL) approach for robotic arm end-tasks.
We employ Bayesian neural network models to represent, in a probabilistic way, both the belief and information encoded in the dynamic model during exploration.
Our experiments show the advantages of our Bayesian model-based RL approach, with similar quality in the results than relevant alternatives.
arXiv Detail & Related papers (2024-04-02T11:44:37Z) - Physics-informed reinforcement learning via probabilistic co-adjustment
functions [3.6787556334630334]
We introduce co-kriging adjustments (CKA) and ridge regression adjustment (RRA) as novel ways to combine the advantages of both approaches.
Our adjustment methods are based on an auto-regressive AR1 co-kriging model that we integrate with GP priors.
arXiv Detail & Related papers (2023-09-11T12:10:19Z) - SAM-RL: Sensing-Aware Model-Based Reinforcement Learning via
Differentiable Physics-Based Simulation and Rendering [49.78647219715034]
We propose a sensing-aware model-based reinforcement learning system called SAM-RL.
With the sensing-aware learning pipeline, SAM-RL allows a robot to select an informative viewpoint to monitor the task process.
We apply our framework to real world experiments for accomplishing three manipulation tasks: robotic assembly, tool manipulation, and deformable object manipulation.
arXiv Detail & Related papers (2022-10-27T05:30:43Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - MALib: A Parallel Framework for Population-based Multi-agent
Reinforcement Learning [61.28547338576706]
Population-based multi-agent reinforcement learning (PB-MARL) refers to the series of methods nested with reinforcement learning (RL) algorithms.
We present MALib, a scalable and efficient computing framework for PB-MARL.
arXiv Detail & Related papers (2021-06-05T03:27:08Z) - Energy-Efficient and Federated Meta-Learning via Projected Stochastic
Gradient Ascent [79.58680275615752]
We propose an energy-efficient federated meta-learning framework.
We assume each task is owned by a separate agent, so a limited number of tasks is used to train a meta-model.
arXiv Detail & Related papers (2021-05-31T08:15:44Z) - Fast Online Adaptation in Robotics through Meta-Learning Embeddings of
Simulated Priors [3.4376560669160385]
In the real world, a robot might encounter any situation starting from motor failures to finding itself in a rocky terrain.
We show that FAMLE allows the robots to adapt to novel damages in significantly fewer time-steps than the baselines.
arXiv Detail & Related papers (2020-03-10T12:37:52Z) - Ready Policy One: World Building Through Active Learning [35.358315617358976]
We introduce Ready Policy One (RP1), a framework that views Model-Based Reinforcement Learning as an active learning problem.
RP1 achieves this by utilizing a hybrid objective function, which crucially adapts during optimization.
We rigorously evaluate our method on a variety of continuous control tasks, and demonstrate statistically significant gains over existing approaches.
arXiv Detail & Related papers (2020-02-07T09:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.