Related papers: Toward Scalable Multirobot Control: Fast Policy Learning in Distributed MPC

Toward Scalable Multirobot Control: Fast Policy Learning in Distributed MPC

URL: http://arxiv.org/abs/2412.19669v1
Date: Fri, 27 Dec 2024 14:31:52 GMT
Title: Toward Scalable Multirobot Control: Fast Policy Learning in Distributed MPC
Authors: Xinglong Zhang, Wei Pan, Cong Li, Xin Xu, Xiangke Wang, Ronghua Zhang, Dewen Hu,
Abstract summary: This article proposes a novel distributed learning-based predictive control (DLPC) framework for scalable multirobot control.<n>Unlike conventional DMPC methods that calculate open-loop control sequences, our approach generates explicit closed-loop DMPC policies for MRS without using numerical solvers.<n>The learned control policies could be deployed online to MRS with varying robot scales, enhancing scalability and transferability for large-scale MRS.
Score: 22.644778818620185
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Distributed model predictive control (DMPC) is promising in achieving optimal cooperative control in multirobot systems (MRS). However, real-time DMPC implementation relies on numerical optimization tools to periodically calculate local control sequences online. This process is computationally demanding and lacks scalability for large-scale, nonlinear MRS. This article proposes a novel distributed learning-based predictive control (DLPC) framework for scalable multirobot control. Unlike conventional DMPC methods that calculate open-loop control sequences, our approach centers around a computationally fast and efficient distributed policy learning algorithm that generates explicit closed-loop DMPC policies for MRS without using numerical solvers. The policy learning is executed incrementally and forward in time in each prediction interval through an online distributed actor-critic implementation. The control policies are successively updated in a receding-horizon manner, enabling fast and efficient policy learning with the closed-loop stability guarantee. The learned control policies could be deployed online to MRS with varying robot scales, enhancing scalability and transferability for large-scale MRS. Furthermore, we extend our methodology to address the multirobot safe learning challenge through a force field-inspired policy learning approach. We validate our approach's effectiveness, scalability, and efficiency through extensive experiments on cooperative tasks of large-scale wheeled robots and multirotor drones. Our results demonstrate the rapid learning and deployment of DMPC policies for MRS with scales up to 10,000 units.

Related papers

Intersection of Reinforcement Learning and Bayesian Optimization for Intelligent Control of Industrial Processes: A Safe MPC-based DPG using Multi-Objective BO [0.0]
Model Predictive Control (MPC)-based Reinforcement Learning (RL) offers a structured and interpretable alternative to Deep Neural Network (DNN)-based RL methods.<n>Standard MPC-RL approaches often suffer from slow convergence, suboptimal policy learning due to limited parameterization, and safety issues during online adaptation.<n>We propose a novel framework that integrates MPC-RL with Multi-Objective Bayesian Optimization (MOBO)
arXiv Detail & Related papers (2025-07-14T02:31:52Z)
What Matters for Batch Online Reinforcement Learning in Robotics? [65.06558240091758]
The ability to learn from large batches of autonomously collected data for policy improvement holds the promise of enabling truly scalable robot learning.<n>Previous works have applied imitation learning and filtered imitation learning methods to the batch online RL problem.<n>We analyze how these axes affect performance and scaling with the amount of autonomous data.
arXiv Detail & Related papers (2025-05-12T21:24:22Z)
Bootstrapped Model Predictive Control [19.652808098339644]
We introduce Bootstrapped Model Predictive Control (BMPC), a novel algorithm that performs policy learning in a bootstrapped manner. BMPC learns a network policy by imitating an MPC expert, and in turn, uses this policy to guide the MPC process. Our method achieves superior performance over prior works on diverse continuous control tasks.
arXiv Detail & Related papers (2025-03-24T16:46:36Z)
Parameter-Adaptive Approximate MPC: Tuning Neural-Network Controllers without Retraining [50.00291020618743]
This work introduces a novel, parameter-adaptive AMPC architecture capable of online tuning without recomputing large datasets and retraining. We showcase the effectiveness of parameter-adaptive AMPC by controlling the swing-ups of two different real cartpole systems with a severely resource-constrained microcontroller (MCU) Taken together, these contributions represent a marked step toward the practical application of AMPC in real-world systems.
arXiv Detail & Related papers (2024-04-08T20:02:19Z)
Tube-NeRF: Efficient Imitation Learning of Visuomotor Policies from MPC using Tube-Guided Data Augmentation and NeRFs [42.220568722735095]
Imitation learning (IL) can train computationally-efficient sensorimotor policies from a resource-intensive Model Predictive Controller (MPC) We propose a data augmentation (DA) strategy that enables efficient learning of vision-based policies. We show 80-fold increase in demonstration efficiency and a 50% reduction in training time over current IL methods.
arXiv Detail & Related papers (2023-11-23T18:54:25Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Distributed-Training-and-Execution Multi-Agent Reinforcement Learning for Power Control in HetNet [48.96004919910818]
We propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet. To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems. In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process.
arXiv Detail & Related papers (2022-12-15T17:01:56Z)
Model Predictive Control via On-Policy Imitation Learning [28.96122879515294]
We develop new sample complexity results and performance guarantees for data-driven Model Predictive Control. Our algorithm uses the structure of constrained linear MPC, and our analysis uses the properties of the explicit MPC solution to theoretically bound the number of online MPC trajectories needed to achieve optimal performance.
arXiv Detail & Related papers (2022-10-17T16:06:06Z)
Training Efficient Controllers via Analytic Policy Gradient [44.0762454494769]
Control design for robotic systems is complex and often requires solving an optimization to follow a trajectory accurately. Online optimization approaches like Model Predictive Control (MPC) have been shown to achieve great tracking performance, but require high computing power. We propose an Analytic Policy Gradient (APG) method to tackle this problem.
arXiv Detail & Related papers (2022-09-26T22:04:35Z)
Policy Search for Model Predictive Control with Application to Agile Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC. Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies. Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z)
Imitation Learning from MPC for Quadrupedal Multi-Gait Control [63.617157490920505]
We present a learning algorithm for training a single policy that imitates multiple gaits of a walking robot. We use and extend MPC-Net, which is an Imitation Learning approach guided by Model Predictive Control. We validate our approach on hardware and show that a single learned policy can replace its teacher to control multiple gaits.
arXiv Detail & Related papers (2021-03-26T08:48:53Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.