Towards Closing the Sim-to-Real Gap in Collaborative Multi-Robot Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2008.07875v1
- Date: Tue, 18 Aug 2020 11:57:33 GMT
- Title: Towards Closing the Sim-to-Real Gap in Collaborative Multi-Robot Deep
Reinforcement Learning
- Authors: Wenshuai Zhao, Jorge Pe\~na Queralta, Li Qingqing, Tomi Westerlund
- Abstract summary: We analyze how multi-agent reinforcement learning can bridge the gap to reality in distributed multi-robot systems.
We introduce the effect of sensing, calibration, and accuracy mismatches in distributed reinforcement learning.
We discuss on how both the different types of perturbances and how the number of agents experiencing those perturbances affect the collaborative learning effort.
- Score: 0.06554326244334865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current research directions in deep reinforcement learning include bridging
the simulation-reality gap, improving sample efficiency of experiences in
distributed multi-agent reinforcement learning, together with the development
of robust methods against adversarial agents in distributed learning, among
many others. In this work, we are particularly interested in analyzing how
multi-agent reinforcement learning can bridge the gap to reality in distributed
multi-robot systems where the operation of the different robots is not
necessarily homogeneous. These variations can happen due to sensing mismatches,
inherent errors in terms of calibration of the mechanical joints, or simple
differences in accuracy. While our results are simulation-based, we introduce
the effect of sensing, calibration, and accuracy mismatches in distributed
reinforcement learning with proximal policy optimization (PPO). We discuss on
how both the different types of perturbances and how the number of agents
experiencing those perturbances affect the collaborative learning effort. The
simulations are carried out using a Kuka arm model in the Bullet physics
engine. This is, to the best of our knowledge, the first work exploring the
limitations of PPO in multi-robot systems when considering that different
robots might be exposed to different environments where their sensors or
actuators have induced errors. With the conclusions of this work, we set the
initial point for future work on designing and developing methods to achieve
robust reinforcement learning on the presence of real-world perturbances that
might differ within a multi-robot system.
Related papers
- Unsupervised Learning of Effective Actions in Robotics [0.9374652839580183]
Current state-of-the-art action representations in robotics lack proper effect-driven learning of the robot's actions.
We propose an unsupervised algorithm to discretize a continuous motion space and generate "action prototypes"
We evaluate our method on a simulated stair-climbing reinforcement learning task.
arXiv Detail & Related papers (2024-04-03T13:28:52Z) - DiAReL: Reinforcement Learning with Disturbance Awareness for Robust
Sim2Real Policy Transfer in Robot Control [0.0]
Delayed Markov decision processes fulfill the Markov property by augmenting the state space of agents with a finite time window of recently committed actions.
We introduce a disturbance-augmented Markov decision process in delayed settings as a novel representation to incorporate disturbance estimation in training on-policy reinforcement learning algorithms.
arXiv Detail & Related papers (2023-06-15T10:11:38Z) - Bridging Active Exploration and Uncertainty-Aware Deployment Using
Probabilistic Ensemble Neural Network Dynamics [11.946807588018595]
This paper presents a unified model-based reinforcement learning framework that bridges active exploration and uncertainty-aware deployment.
The two opposing tasks of exploration and deployment are optimized through state-of-the-art sampling-based MPC.
We conduct experiments on both autonomous vehicles and wheeled robots, showing promising results for both exploration and deployment.
arXiv Detail & Related papers (2023-05-20T17:20:12Z) - Exploiting Symmetry and Heuristic Demonstrations in Off-policy
Reinforcement Learning for Robotic Manipulation [1.7901837062462316]
This paper aims to define and incorporate the natural symmetry present in physical robotic environments.
The proposed method is validated via two point-to-point reaching tasks of an industrial arm, with and without an obstacle.
A comparison study between the proposed method and a traditional off-policy reinforcement learning algorithm indicates its advantage in learning performance and potential value for applications.
arXiv Detail & Related papers (2023-04-12T11:38:01Z) - Joint Training of Deep Ensembles Fails Due to Learner Collusion [61.557412796012535]
Ensembles of machine learning models have been well established as a powerful method of improving performance over a single model.
Traditionally, ensembling algorithms train their base learners independently or sequentially with the goal of optimizing their joint performance.
We show that directly minimizing the loss of the ensemble appears to rarely be applied in practice.
arXiv Detail & Related papers (2023-01-26T18:58:07Z) - Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot
Learning [121.9708998627352]
Recent work has shown that, in practical robot learning applications, the effects of adversarial training do not pose a fair trade-off.
This work revisits the robustness-accuracy trade-off in robot learning by analyzing if recent advances in robust training methods and theory can make adversarial training suitable for real-world robot applications.
arXiv Detail & Related papers (2022-04-15T08:12:15Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions.
In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems.
Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Risk-Sensitive Sequential Action Control with Multi-Modal Human
Trajectory Forecasting for Safe Crowd-Robot Interaction [55.569050872780224]
We present an online framework for safe crowd-robot interaction based on risk-sensitive optimal control, wherein the risk is modeled by the entropic risk measure.
Our modular approach decouples the crowd-robot interaction into learning-based prediction and model-based control.
A simulation study and a real-world experiment show that the proposed framework can accomplish safe and efficient navigation while avoiding collisions with more than 50 humans in the scene.
arXiv Detail & Related papers (2020-09-12T02:02:52Z) - Ubiquitous Distributed Deep Reinforcement Learning at the Edge:
Analyzing Byzantine Agents in Discrete Action Spaces [0.06554326244334865]
This paper discusses some of the challenges in multi-agent distributed deep reinforcement learning that can occur in the presence of byzantine or malfunctioning agents.
We show how wrong discrete actions can significantly affect the collaborative learning effort.
Experiments are carried out in a simulation environment using the Atari testbed for the discrete action spaces, and advantage actor-critic (A2C) for the distributed multi-agent training.
arXiv Detail & Related papers (2020-08-18T11:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.