Related papers: Bandit Submodular Maximization for Multi-Robot Coordination in Unpredictable and Partially Observable Environments

Bandit Submodular Maximization for Multi-Robot Coordination in Unpredictable and Partially Observable Environments

URL: http://arxiv.org/abs/2305.12795v2
Date: Fri, 26 May 2023 09:22:54 GMT
Title: Bandit Submodular Maximization for Multi-Robot Coordination in Unpredictable and Partially Observable Environments
Authors: Zirui Xu, Xiaofeng Lin, Vasileios Tzoumas
Abstract summary: We introduce the first submodular coordination algorithm with bandit feedback and tracking regret. bandit feedback is the robots' ability to compute in hindsight only the effect of their chosen actions. tracking regret is the algorithm's suboptimality with respect to the optimal time-varying actions that fully know the future a priori.
Score: 3.0294344089697596
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of multi-agent coordination in unpredictable and partially observable environments, that is, environments whose future evolution is unknown a priori and that can only be partially observed. We are motivated by the future of autonomy that involves multiple robots coordinating actions in dynamic, unstructured, and partially observable environments to complete complex tasks such as target tracking, environmental mapping, and area monitoring. Such tasks are often modeled as submodular maximization coordination problems due to the information overlap among the robots. We introduce the first submodular coordination algorithm with bandit feedback and bounded tracking regret -- bandit feedback is the robots' ability to compute in hindsight only the effect of their chosen actions, instead of all the alternative actions that they could have chosen instead, due to the partial observability; and tracking regret is the algorithm's suboptimality with respect to the optimal time-varying actions that fully know the future a priori. The bound gracefully degrades with the environments' capacity to change adversarially, quantifying how often the robots should re-select actions to learn to coordinate as if they fully knew the future a priori. The algorithm generalizes the seminal Sequential Greedy algorithm by Fisher et al. to the bandit setting, by leveraging submodularity and algorithms for the problem of tracking the best action. We validate our algorithm in simulated scenarios of multi-target tracking.

Related papers

Reinforcement learning with combinatorial actions for coupled restless bandits [62.89013331120493]
We propose SEQUOIA, an RL algorithm that directly optimize for long-term reward over the feasible action space. We empirically validate SEQUOIA on four novel restless bandit problems with constraints: multiple interventions, path constraints, bipartite matching, and capacity constraints.
arXiv Detail & Related papers (2025-03-01T21:25:21Z)
Communication- and Computation-Efficient Distributed Decision-Making in Multi-Robot Networks [2.8936428431504164]
We provide a distributed coordination paradigm that enables scalable and near-optimal joint motion planning among multiple robots. Our algorithm is up to two orders faster than competitive near-optimal algorithms. In simulations of surveillance tasks with up to 45 robots, it enables real-time planning at the order of 1 Hz with superior coverage performance.
arXiv Detail & Related papers (2024-07-15T01:25:39Z)
Leveraging Untrustworthy Commands for Multi-Robot Coordination in Unpredictable Environments: A Bandit Submodular Maximization Approach [5.087424045458335]
We study the problem of multi-agent coordination in unpredictable environments with untrustworthy external commands. We provide an algorithm, Meta Bandit Sequential Greedy, which enjoys performance guarantees even when the external commands are arbitrarily bad.
arXiv Detail & Related papers (2023-09-28T04:26:06Z)
AI planning in the imagination: High-level planning on learned abstract search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training. We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z)
Contribution \`a l'Optimisation d'un Comportement Collectif pour un Groupe de Robots Autonomes [0.0]
This thesis studies the domain of collective robotics, and more particularly the optimization problems of multirobot systems. The first contribution is the use of the Butterfly Algorithm Optimization (BOA) to solve the Unknown Area Exploration problem. The second contribution is the development of a new simulation framework for benchmarking dynamic incremental problems in robotics.
arXiv Detail & Related papers (2023-06-10T21:49:08Z)
Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment. We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z)
Online Submodular Coordination with Bounded Tracking Regret: Theory, Algorithm, and Applications to Multi-Robot Coordination [15.588080817106563]
We are motivated by the future autonomy that involves multiple robots coordinating in dynamic, unstructured, and adversarial environments. We introduce the first submodular coordination algorithm with bounded tracking regret, ie., with bounded suboptimality with respect to optimal time-varying actions that know the future a priori. Our algorithm generalizes the seminal Sequential Greedy algorithm by Fisher et al. to unpredictable environments, leveraging submodularity and algorithms for the problem of tracking the best expert.
arXiv Detail & Related papers (2022-09-26T05:31:34Z)
Multi-subgoal Robot Navigation in Crowds with History Information and Interactions [0.0]
Multi-subgoal robot navigation approach based on deep reinforcement learning is proposed. Next position point is planned for the robot by introducing history information and interactions in our work. Experiments demonstrate that our approach outperforms state-of-the-art approaches in terms of both success rate and collision rate.
arXiv Detail & Related papers (2022-05-04T11:24:49Z)
SABER: Data-Driven Motion Planner for Autonomously Navigating Heterogeneous Robots [112.2491765424719]
We present an end-to-end online motion planning framework that uses a data-driven approach to navigate a heterogeneous robot team towards a global goal. We use model predictive control (SMPC) to calculate control inputs that satisfy robot dynamics, and consider uncertainty during obstacle avoidance with chance constraints. recurrent neural networks are used to provide a quick estimate of future state uncertainty considered in the SMPC finite-time horizon solution. A Deep Q-learning agent is employed to serve as a high-level path planner, providing the SMPC with target positions that move the robots towards a desired global goal.
arXiv Detail & Related papers (2021-08-03T02:56:21Z)
Instance-Aware Predictive Navigation in Multi-Agent Environments [93.15055834395304]
We propose an Instance-Aware Predictive Control (IPC) approach, which forecasts interactions between agents as well as future scene structures. We adopt a novel multi-instance event prediction module to estimate the possible interaction among agents in the ego-centric view. We design a sequential action sampling strategy to better leverage predicted states on both scene-level and instance-level.
arXiv Detail & Related papers (2021-01-14T22:21:25Z)
Jump Operator Planning: Goal-Conditioned Policy Ensembles and Zero-Shot Transfer [71.44215606325005]
We propose a novel framework called Jump-Operator Dynamic Programming for quickly computing solutions within a super-exponential space of sequential sub-goal tasks. This approach involves controlling over an ensemble of reusable goal-conditioned polices functioning as temporally extended actions. We then identify classes of objective functions on this subspace whose solutions are invariant to the grounding, resulting in optimal zero-shot transfer.
arXiv Detail & Related papers (2020-07-06T05:13:20Z)
Thinking While Moving: Deep Reinforcement Learning with Concurrent Control [122.49572467292293]
We study reinforcement learning in settings where sampling an action from the policy must be done concurrently with the time evolution of the controlled system. Much like a person or an animal, the robot must think and move at the same time, deciding on its next action before the previous one has completed.
arXiv Detail & Related papers (2020-04-13T17:49:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.