Distributed Reinforcement Learning for Cooperative Multi-Robot Object
Manipulation
- URL: http://arxiv.org/abs/2003.09540v1
- Date: Sat, 21 Mar 2020 00:43:54 GMT
- Title: Distributed Reinforcement Learning for Cooperative Multi-Robot Object
Manipulation
- Authors: Guohui Ding, Joewie J. Koh, Kelly Merckaert, Bram Vanderborght, Marco
M. Nicotra, Christoffer Heckman, Alessandro Roncone, Lijun Chen
- Abstract summary: We consider solving a cooperative multi-robot object manipulation task using reinforcement learning (RL)
We propose two distributed multi-agent RL approaches: distributed approximate RL (DA-RL) and game-theoretic RL (GT-RL)
Although we focus on a small system of two agents in this paper, both DA-RL and GT-RL apply to general multi-agent systems, and are expected to scale well to large systems.
- Score: 53.262360083572005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider solving a cooperative multi-robot object manipulation task using
reinforcement learning (RL). We propose two distributed multi-agent RL
approaches: distributed approximate RL (DA-RL), where each agent applies
Q-learning with individual reward functions; and game-theoretic RL (GT-RL),
where the agents update their Q-values based on the Nash equilibrium of a
bimatrix Q-value game. We validate the proposed approaches in the setting of
cooperative object manipulation with two simulated robot arms. Although we
focus on a small system of two agents in this paper, both DA-RL and GT-RL apply
to general multi-agent systems, and are expected to scale well to large
systems.
Related papers
- MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation [64.2621682259008]
Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2)<n>We propose a Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2) to integrate policy learning with multi-agent tree search.<n>We show that MARTI-MARS2 achieves 77.7%, outperforming strong baselines like GPT-5.1 on challenging code generation benchmarks.
arXiv Detail & Related papers (2026-02-08T07:28:44Z) - AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework [76.96794548655292]
Large language models (LLMs) have sparked growing interest in building generalist agents that can learn through online interactions.<n>Applying reinforcement learning (RL) to train LLM agents in multi-turn, multi-task settings remains challenging due to lack of scalable infrastructure and stable training algorithms.<n>We present the AgentRL framework for scalable multi-turn, multi-task agentic RL training.
arXiv Detail & Related papers (2025-10-05T13:40:01Z) - AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning [129.44038804430542]
We introduce AgentGym-RL, a new framework to train LLM agents for multi-turn interactive decision-making through RL.<n>We propose ScalingInter-RL, a training approach designed for exploration-exploitation balance and stable RL optimization.<n>Our agents match or surpass commercial models on 27 tasks across diverse environments.
arXiv Detail & Related papers (2025-09-10T16:46:11Z) - RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning [125.65034908728828]
Training large language models (LLMs) as interactive agents presents unique challenges.
While reinforcement learning has enabled progress in static tasks, multi-turn agent RL training remains underexplored.
We propose StarPO, a general framework for trajectory-level agent RL, and introduce RAGEN, a modular system for training and evaluating LLM agents.
arXiv Detail & Related papers (2025-04-24T17:57:08Z) - MALT: Improving Reasoning with Multi-Agent LLM Training [66.9481561915524]
MALT (Multi-Agent LLM Training) is a novel post-training strategy that divides the reasoning process into generation, verification, and refinement steps.<n>On MATH, GSM8K, and CSQA, MALT surpasses the same baseline LLM with a relative improvement of 15.66%, 7.42%, and 9.40% respectively.
arXiv Detail & Related papers (2024-12-02T19:30:36Z) - ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
Diffusion model (DM) recently achieved huge success in various scenarios including offline reinforcement learning.
We propose MADiff, a novel generative multi-agent learning framework to tackle this problem.
Our experiments show the superior performance of MADiff compared to baseline algorithms in a wide range of multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - Group-Agent Reinforcement Learning [12.915860504511523]
It can largely benefit the reinforcement learning process of each agent if multiple geographically distributed agents perform their separate RL tasks cooperatively.
We propose a distributed RL framework called DDAL (Decentralised Distributed Asynchronous Learning) designed for group-agent reinforcement learning (GARL)
arXiv Detail & Related papers (2022-02-10T16:40:59Z) - Decentralized Cooperative Multi-Agent Reinforcement Learning with
Exploration [35.75029940279768]
We study multi-agent reinforcement learning in the most basic cooperative setting -- Markov teams.
We propose an algorithm in which each agent independently runs a stage-based V-learning style algorithm.
We show that the agents can learn an $epsilon$-approximate Nash equilibrium policy in at most $proptowidetildeO (1/epsilon4)$ episodes.
arXiv Detail & Related papers (2021-10-12T02:45:12Z) - The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces [36.097537237660234]
We propose an algorithm that can provably find the Nash equilibrium policy using a number of samples.
A key component of our new algorithm is the exploiter, which facilitates the learning of the main player by deliberately exploiting her weakness.
Our theoretical framework is generic, which applies to a wide range of models including but not limited to MGs, MGs with linear or kernel function approximation, and MGs with rich observations.
arXiv Detail & Related papers (2021-06-07T05:39:09Z) - MALib: A Parallel Framework for Population-based Multi-agent
Reinforcement Learning [61.28547338576706]
Population-based multi-agent reinforcement learning (PB-MARL) refers to the series of methods nested with reinforcement learning (RL) algorithms.
We present MALib, a scalable and efficient computing framework for PB-MARL.
arXiv Detail & Related papers (2021-06-05T03:27:08Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z) - Extended Markov Games to Learn Multiple Tasks in Multi-Agent
Reinforcement Learning [7.332887338089177]
We formally define Extended Markov Games as a general mathematical model that allows multiple RL agents to concurrently learn various non-Markovian specifications.
Specifically, we use our model to train two different logic-based multi-agent RL algorithms to solve diverse settings of non-Markovian co-safe specifications.
arXiv Detail & Related papers (2020-02-14T12:37:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.