Improving Generalization of Deep Reinforcement Learning-based TSP
Solvers
- URL: http://arxiv.org/abs/2110.02843v1
- Date: Wed, 6 Oct 2021 15:16:19 GMT
- Title: Improving Generalization of Deep Reinforcement Learning-based TSP
Solvers
- Authors: Wenbin Ouyang, Yisen Wang, Shaochen Han, Zhejian Jin and Paul Weng
- Abstract summary: We propose a novel approach named MAGIC that includes a deep learning architecture and a DRL training method.
Our architecture, which integrates a multilayer perceptron, a graph neural network, and an attention model, defines a policy that sequentially generates a traveling salesman solution.
Our training method includes several innovations: (1) we interleave DRL policy updates with local search (using a new local search technique), (2) we use a novel simple baseline, and (3) we apply gradient learning.
- Score: 19.29028564568974
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work applying deep reinforcement learning (DRL) to solve traveling
salesman problems (TSP) has shown that DRL-based solvers can be fast and
competitive with TSP heuristics for small instances, but do not generalize well
to larger instances. In this work, we propose a novel approach named MAGIC that
includes a deep learning architecture and a DRL training method. Our
architecture, which integrates a multilayer perceptron, a graph neural network,
and an attention model, defines a stochastic policy that sequentially generates
a TSP solution. Our training method includes several innovations: (1) we
interleave DRL policy gradient updates with local search (using a new local
search technique), (2) we use a novel simple baseline, and (3) we apply
curriculum learning. Finally, we empirically demonstrate that MAGIC is superior
to other DRL-based methods on random TSP instances, both in terms of
performance and generalizability. Moreover, our method compares favorably
against TSP heuristics and other state-of-the-art approach in terms of
performance and computational time.
Related papers
- Pointerformer: Deep Reinforced Multi-Pointer Transformer for the
Traveling Salesman Problem [67.32731657297377]
Traveling Salesman Problem (TSP) is a classic routing optimization problem originally arising in the domain of transportation and logistics.
Recently, Deep Reinforcement Learning has been increasingly employed to solve TSP due to its high inference efficiency.
We propose a novel end-to-end DRL approach, referred to as Pointerformer, based on multi-pointer Transformer.
arXiv Detail & Related papers (2023-04-19T03:48:32Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Deep Reinforcement Learning Guided Improvement Heuristic for Job Shop
Scheduling [30.45126420996238]
This paper proposes a novel DRL-guided improvement for solving JSSP, where graph representation is employed to encode complete solutions.
We design a Graph Neural-Network-based representation scheme, consisting of two modules to effectively capture the information of dynamic topology and different types of nodes in graphs encountered during the improvement process.
We prove that our method scales linearly with problem size. Experiments on classic benchmarks show that the improvement policy learned by our method outperforms state-of-the-art DRL-based methods by a large margin.
arXiv Detail & Related papers (2022-11-20T10:20:13Z) - Meta Reinforcement Learning with Successor Feature Based Context [51.35452583759734]
We propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms.
Our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training.
arXiv Detail & Related papers (2022-07-29T14:52:47Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Generalization in Deep RL for TSP Problems via Equivariance and Local
Search [21.07325126324399]
We propose a simple deep learning architecture that learns with novel RL training techniques.
We empirically evaluate our proposition on random and realistic TSP problems against relevant state-of-the-art deep RL methods.
arXiv Detail & Related papers (2021-10-07T16:20:37Z) - POAR: Efficient Policy Optimization via Online Abstract State
Representation Learning [6.171331561029968]
State Representation Learning (SRL) is proposed to specifically learn to encode task-relevant features from complex sensory data into low-dimensional states.
We introduce a new SRL prior called domain resemblance to leverage expert demonstration to improve SRL interpretations.
We empirically verify POAR to efficiently handle tasks in high dimensions and facilitate training real-life robots directly from scratch.
arXiv Detail & Related papers (2021-09-17T16:52:03Z) - RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL)
By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code.
We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z) - FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance
Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks.
This problem is still not fully understood, for which two major challenges need to be addressed.
We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.