Online Submodular Coordination with Bounded Tracking Regret: Theory,
Algorithm, and Applications to Multi-Robot Coordination
- URL: http://arxiv.org/abs/2209.12429v1
- Date: Mon, 26 Sep 2022 05:31:34 GMT
- Title: Online Submodular Coordination with Bounded Tracking Regret: Theory,
Algorithm, and Applications to Multi-Robot Coordination
- Authors: Zirui Xu and Hongyu Zhou and Vasileios Tzoumas
- Abstract summary: We are motivated by the future autonomy that involves multiple robots coordinating in dynamic, unstructured, and adversarial environments.
We introduce the first submodular coordination algorithm with bounded tracking regret, ie., with bounded suboptimality with respect to optimal time-varying actions that know the future a priori.
Our algorithm generalizes the seminal Sequential Greedy algorithm by Fisher et al. to unpredictable environments, leveraging submodularity and algorithms for the problem of tracking the best expert.
- Score: 15.588080817106563
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We enable efficient and effective coordination in unpredictable environments,
ie., in environments whose future evolution is unknown a priori and even
adversarial. We are motivated by the future of autonomy that involves multiple
robots coordinating in dynamic, unstructured, and adversarial environments to
complete complex tasks such as target tracking, image covering, and area
monitoring. Such tasks are often modeled as submodular maximization
coordination problems. We thus introduce the first submodular coordination
algorithm with bounded tracking regret, ie., with bounded suboptimality with
respect to optimal time-varying actions that know the future a priori. The
bound gracefully degrades with the environments' capacity to change
adversarially. It also quantifies how often the robots must re-select actions
to "learn" to coordinate as if they knew the future a priori. Our algorithm
generalizes the seminal Sequential Greedy algorithm by Fisher et al. to
unpredictable environments, leveraging submodularity and algorithms for the
problem of tracking the best expert. We validate our algorithm in simulated
scenarios of target tracking.
Related papers
- Performance-Aware Self-Configurable Multi-Agent Networks: A Distributed Submodular Approach for Simultaneous Coordination and Network Design [3.5527561584422465]
We present AlterNAting COordination and Network-Design Algorithm (Anaconda)
Anaconda is a scalable algorithm that also enjoys near-optimality guarantees.
We demonstrate in simulated scenarios of area monitoring and compare it with a state-of-the-art algorithm.
arXiv Detail & Related papers (2024-09-02T18:11:33Z) - Robotic warehousing operations: a learn-then-optimize approach to large-scale neighborhood search [84.39855372157616]
This paper supports robotic parts-to-picker operations in warehousing by optimizing order-workstation assignments, item-pod assignments and the schedule of order fulfillment at workstations.
We solve it via large-scale neighborhood search, with a novel learn-then-optimize approach to subproblem generation.
In collaboration with Amazon Robotics, we show that our model and algorithm generate much stronger solutions for practical problems than state-of-the-art approaches.
arXiv Detail & Related papers (2024-08-29T20:22:22Z) - Communication- and Computation-Efficient Distributed Decision-Making in Multi-Robot Networks [2.8936428431504164]
We provide a distributed coordination paradigm that enables scalable and near-optimal joint motion planning among multiple robots.
Our algorithm is up to two orders faster than competitive near-optimal algorithms.
In simulations of surveillance tasks with up to 45 robots, it enables real-time planning at the order of 1 Hz with superior coverage performance.
arXiv Detail & Related papers (2024-07-15T01:25:39Z) - Leveraging Untrustworthy Commands for Multi-Robot Coordination in
Unpredictable Environments: A Bandit Submodular Maximization Approach [5.087424045458335]
We study the problem of multi-agent coordination in unpredictable environments with untrustworthy external commands.
We provide an algorithm, Meta Bandit Sequential Greedy, which enjoys performance guarantees even when the external commands are arbitrarily bad.
arXiv Detail & Related papers (2023-09-28T04:26:06Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Bandit Submodular Maximization for Multi-Robot Coordination in
Unpredictable and Partially Observable Environments [3.0294344089697596]
We introduce the first submodular coordination algorithm with bandit feedback and tracking regret.
bandit feedback is the robots' ability to compute in hindsight only the effect of their chosen actions.
tracking regret is the algorithm's suboptimality with respect to the optimal time-varying actions that fully know the future a priori.
arXiv Detail & Related papers (2023-05-22T07:44:54Z) - SABER: Data-Driven Motion Planner for Autonomously Navigating
Heterogeneous Robots [112.2491765424719]
We present an end-to-end online motion planning framework that uses a data-driven approach to navigate a heterogeneous robot team towards a global goal.
We use model predictive control (SMPC) to calculate control inputs that satisfy robot dynamics, and consider uncertainty during obstacle avoidance with chance constraints.
recurrent neural networks are used to provide a quick estimate of future state uncertainty considered in the SMPC finite-time horizon solution.
A Deep Q-learning agent is employed to serve as a high-level path planner, providing the SMPC with target positions that move the robots towards a desired global goal.
arXiv Detail & Related papers (2021-08-03T02:56:21Z) - Jump Operator Planning: Goal-Conditioned Policy Ensembles and Zero-Shot
Transfer [71.44215606325005]
We propose a novel framework called Jump-Operator Dynamic Programming for quickly computing solutions within a super-exponential space of sequential sub-goal tasks.
This approach involves controlling over an ensemble of reusable goal-conditioned polices functioning as temporally extended actions.
We then identify classes of objective functions on this subspace whose solutions are invariant to the grounding, resulting in optimal zero-shot transfer.
arXiv Detail & Related papers (2020-07-06T05:13:20Z) - Continuous Control for Searching and Planning with a Learned Model [5.196149362684628]
Decision-making agents with planning capabilities have achieved huge success in the challenging domain like Chess, Shogi, and Go.
Researchers proposed the MuZero algorithm that can learn the dynamical model through the interactions with the environment.
We show the proposed algorithm outperforms the soft actor-critic (SAC) algorithm, a state-of-the-art model-free deep reinforcement learning algorithm.
arXiv Detail & Related papers (2020-06-12T19:10:41Z) - Thinking While Moving: Deep Reinforcement Learning with Concurrent
Control [122.49572467292293]
We study reinforcement learning in settings where sampling an action from the policy must be done concurrently with the time evolution of the controlled system.
Much like a person or an animal, the robot must think and move at the same time, deciding on its next action before the previous one has completed.
arXiv Detail & Related papers (2020-04-13T17:49:29Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.