Efficient Planning in Combinatorial Action Spaces with Applications to
Cooperative Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2302.04376v1
- Date: Wed, 8 Feb 2023 23:42:49 GMT
- Title: Efficient Planning in Combinatorial Action Spaces with Applications to
Cooperative Multi-Agent Reinforcement Learning
- Authors: Volodymyr Tkachuk, Seyed Alireza Bakhtiari, Johannes Kirschner, Matej
Jusup, Ilija Bogunovic, Csaba Szepesv\'ari
- Abstract summary: In cooperative multi-agent reinforcement learning, a potentially large number of agents jointly optimize a global reward function, which leads to a blow-up in the action space by the number of agents.
As a minimal requirement, we assume access to an argmax oracle that allows to efficiently compute the greedy policy for any Q-function in the model class.
We propose efficient algorithms for this setting that lead to compute and query complexity in all relevant problem parameters.
- Score: 16.844525262228103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A practical challenge in reinforcement learning are combinatorial action
spaces that make planning computationally demanding. For example, in
cooperative multi-agent reinforcement learning, a potentially large number of
agents jointly optimize a global reward function, which leads to a
combinatorial blow-up in the action space by the number of agents. As a minimal
requirement, we assume access to an argmax oracle that allows to efficiently
compute the greedy policy for any Q-function in the model class. Building on
recent work in planning with local access to a simulator and linear function
approximation, we propose efficient algorithms for this setting that lead to
polynomial compute and query complexity in all relevant problem parameters. For
the special case where the feature decomposition is additive, we further
improve the bounds and extend the results to the kernelized setting with an
efficient algorithm.
Related papers
- Parallel AutoRegressive Models for Multi-Agent Combinatorial Optimization [17.392822956504848]
We propose a reinforcement learning framework designed to construct high-quality solutions for multi-agent tasks efficiently.
PARCO integrates three key components: (1) transformer-based communication layers to enable effective agent collaboration during parallel solution construction, (2) a multiple pointer mechanism for low-latency, parallel agent decision-making, and (3) priority-based conflict handlers to resolve decision conflicts via learned priorities.
We evaluate PARCO in multi-agent vehicle routing and scheduling problems where our approach outperforms state-of-the-art learning methods and demonstrates strong generalization ability and remarkable computational efficiency.
arXiv Detail & Related papers (2024-09-05T17:49:18Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit [55.2480439325792]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB)
We introduce an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound up to a problem-dependent constant factor.
We numerically show that the CombGapE algorithm outperforms existing methods significantly in both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-06-15T15:37:31Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Deep Policy Iteration with Integer Programming for Inventory Management [8.27175065641495]
We present a framework for optimizing long-term discounted reward problems with large accessible action space and state dependent constraints.
Our proposed Programmable Actor Reinforcement Learning (PARL) uses a deep-policy method that leverages neural networks (NNs) to approximate the value function.
We benchmark the proposed algorithm against state-of-the-art RL algorithms and commonly used replenishments and find it considerably outperforms existing methods by as much as 14.7% on average.
arXiv Detail & Related papers (2021-12-04T01:40:34Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Reinforcement Learning with Combinatorial Actions: An Application to
Vehicle Routing [9.995347522610674]
We develop a framework for value-function-based deep reinforcement learning with a reinforcement action space.
We present an application of this framework to the capacitated vehicle routing problem (CVRP)
On each instance, we model an action as the construction of a single route, and consider a deterministic policy which is improved through a simple policy algorithm.
arXiv Detail & Related papers (2020-10-22T19:32:21Z) - Jump Operator Planning: Goal-Conditioned Policy Ensembles and Zero-Shot
Transfer [71.44215606325005]
We propose a novel framework called Jump-Operator Dynamic Programming for quickly computing solutions within a super-exponential space of sequential sub-goal tasks.
This approach involves controlling over an ensemble of reusable goal-conditioned polices functioning as temporally extended actions.
We then identify classes of objective functions on this subspace whose solutions are invariant to the grounding, resulting in optimal zero-shot transfer.
arXiv Detail & Related papers (2020-07-06T05:13:20Z) - Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal
Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination.
We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z) - A Novel Multi-Agent System for Complex Scheduling Problems [2.294014185517203]
This paper is the conception and implementation of a multi-agent system that is applicable in various problem domains.
We simulate a NP-hard scheduling problem to demonstrate the validity of our approach.
This paper highlights the advantages of the agent-based approach, like the reduction in layout complexity, improved control of complicated systems, and extendability.
arXiv Detail & Related papers (2020-04-20T14:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.