DAN: Decentralized Attention-based Neural Network to Solve the MinMax
Multiple Traveling Salesman Problem
- URL: http://arxiv.org/abs/2109.04205v1
- Date: Thu, 9 Sep 2021 12:26:04 GMT
- Title: DAN: Decentralized Attention-based Neural Network to Solve the MinMax
Multiple Traveling Salesman Problem
- Authors: Yuhong Cao and Zhanhong Sun and Guillaume Sartoretti
- Abstract summary: We introduce a decentralized attention-based neural network method to solve the MinMax mTSP, named DAN.
In DAN, agents learn fully decentralized policies to collaboratively construct a tour, by predicting the future decisions of other agents.
We experimentally demonstrate our model on small- to large-scale mTSP instances, which involve 50 to 1000 cities and 5 to 20 agents, and compare against state-of-the-art baselines.
- Score: 5.137147284997655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The multiple traveling salesman problem (mTSP) is a well-known NP-hard
problem with numerous real-world applications. In particular, this work
addresses MinMax mTSP, where the objective is to minimize the max tour length
(sum of Euclidean distances) among all agents. The mTSP is normally considered
as a combinatorial optimization problem, but due to its computational
complexity, search-based exact and heuristic algorithms become inefficient as
the number of cities increases. Encouraged by the recent developments in deep
reinforcement learning (dRL), this work considers the mTSP as a cooperative
task and introduces a decentralized attention-based neural network method to
solve the MinMax mTSP, named DAN. In DAN, agents learn fully decentralized
policies to collaboratively construct a tour, by predicting the future
decisions of other agents. Our model relies on the Transformer architecture,
and is trained using multi-agent RL with parameter sharing, which provides
natural scalability to the numbers of agents and cities. We experimentally
demonstrate our model on small- to large-scale mTSP instances, which involve 50
to 1000 cities and 5 to 20 agents, and compare against state-of-the-art
baselines. For small-scale problems (fewer than 100 cities), DAN is able to
closely match the performance of the best solver available (OR Tools, a
meta-heuristic solver) given the same computation time budget. In larger-scale
instances, DAN outperforms both conventional and dRL-based solvers, while
keeping computation times low, and exhibits enhanced collaboration among
agents.
Related papers
- Learning Emergence of Interaction Patterns across Independent RL Agents in Multi-Agent Environments [3.0284592792243794]
Bottom Up Network (BUN) treats the collective of multi-agents as a unified entity.
Our empirical evaluations across a variety of cooperative multi-agent scenarios, including tasks such as cooperative navigation and traffic control, consistently demonstrate BUN's superiority over baseline methods with substantially reduced computational costs.
arXiv Detail & Related papers (2024-10-03T14:25:02Z) - iMTSP: Solving Min-Max Multiple Traveling Salesman Problem with Imperative Learning [8.747306544853961]
The paper considers a Min-Max Multiple Traveling Salesman Problem (MTSP)
The goal is to find a set of tours, one for each agent, to collectively visit all the cities while minimizing the length of the longest tour.
We introduce a new self-supervised, bilevel, end-to-end learning framework, which we refer to as imperative MTSP (iMTSP)
arXiv Detail & Related papers (2024-05-01T02:26:13Z) - MASP: Scalable GNN-based Planning for Multi-Agent Navigation [17.788592987873905]
We propose a goal-conditioned hierarchical planner for navigation tasks with a substantial number of agents.
We also leverage graph neural networks (GNN) to model the interaction between agents and goals, improving goal achievement.
The results demonstrate that MASP outperforms classical planning-based competitors and RL baselines.
arXiv Detail & Related papers (2023-12-05T06:05:04Z) - Learning RL-Policies for Joint Beamforming Without Exploration: A Batch
Constrained Off-Policy Approach [1.0080317855851213]
We consider the problem of network parameter cancellation optimization for networks.
We show that deploying an algorithm in the real world for exploration and learning can be achieved with the data without exploring.
arXiv Detail & Related papers (2023-10-12T18:36:36Z) - Pointerformer: Deep Reinforced Multi-Pointer Transformer for the
Traveling Salesman Problem [67.32731657297377]
Traveling Salesman Problem (TSP) is a classic routing optimization problem originally arising in the domain of transportation and logistics.
Recently, Deep Reinforcement Learning has been increasingly employed to solve TSP due to its high inference efficiency.
We propose a novel end-to-end DRL approach, referred to as Pointerformer, based on multi-pointer Transformer.
arXiv Detail & Related papers (2023-04-19T03:48:32Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Centralized Model and Exploration Policy for Multi-Agent RL [13.661446184763117]
Reinforcement learning in partially observable, fully cooperative multi-agent settings (Dec-POMDPs) can be used to address many real-world challenges.
Current RL algorithms for Dec-POMDPs suffer from poor sample complexity.
We propose a model-based algorithm, MARCO, in three cooperative communication tasks, where it improves sample efficiency by up to 20x.
arXiv Detail & Related papers (2021-07-14T00:34:08Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Adaptive Stochastic ADMM for Decentralized Reinforcement Learning in
Edge Industrial IoT [106.83952081124195]
Reinforcement learning (RL) has been widely investigated and shown to be a promising solution for decision-making and optimal control processes.
We propose an adaptive ADMM (asI-ADMM) algorithm and apply it to decentralized RL with edge-computing-empowered IIoT networks.
Experiment results show that our proposed algorithms outperform the state of the art in terms of communication costs and scalability, and can well adapt to complex IoT environments.
arXiv Detail & Related papers (2021-06-30T16:49:07Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.