A Game-Theoretic Approach for Improving Generalization Ability of TSP
Solvers
- URL: http://arxiv.org/abs/2110.15105v2
- Date: Fri, 29 Oct 2021 09:06:30 GMT
- Title: A Game-Theoretic Approach for Improving Generalization Ability of TSP
Solvers
- Authors: Chenguang Wang, Yaodong Yang, Oliver Slumbers, Congying Han, Tiande
Guo, Haifeng Zhang, Jun Wang
- Abstract summary: We introduce a two-player zerosum framework between a trainable emphr and a emphData Generator.
We show that our framework achieves the most generalizable performance on different Traveling Salesman Problems tasks.
- Score: 16.98434288039677
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we shed new light on the generalization ability of deep
learning-based solvers for Traveling Salesman Problems (TSP). Specifically, we
introduce a two-player zero-sum framework between a trainable \emph{Solver} and
a \emph{Data Generator}, where the Solver aims to solve the task instances
provided by the Generator, and the Generator aims to generate increasingly
difficult instances for improving the Solver. Grounded in \textsl{Policy Space
Response Oracle} (PSRO) methods, our two-player framework outputs a population
of best-responding Solvers, over which we can mix and output a combined model
that achieves the least exploitability against the Generator, and thereby the
most generalizable performance on different TSP tasks. We conduct experiments
on a variety of TSP instances with different types and sizes. Results suggest
that our Solvers achieve the state-of-the-art performance even on tasks the
Solver never meets, whilst the performance of other deep learning-based Solvers
drops sharply due to over-fitting. On real-world instances from
\textsc{TSPLib}, our method also attains a \textbf{12\%} improvement, in terms
of optimal gap, over the best baseline model. To demonstrate the principle of
our framework, we study the learning outcome of the proposed two-player game
and demonstrate that the exploitability of the Solver population decreases
during training, and it eventually approximates the Nash equilibrium along with
the Generator.
Related papers
- Neural Solver Selection for Combinatorial Optimization [23.449310200885893]
We propose the first general framework to coordinate neural solvers, which involves feature extraction, selection model, and selection strategy.
We show that the proposed framework can effectively distribute instances and the resulting composite solver can achieve significantly better performance.
arXiv Detail & Related papers (2024-10-13T02:05:41Z) - Leveraging Constraint Programming in a Deep Learning Approach for Dynamically Solving the Flexible Job-Shop Scheduling Problem [1.3927943269211593]
This paper aims to integrate constraint programming (CP) within a deep learning (DL) based methodology, leveraging the benefits of both.
We introduce a method that involves training a DL model using optimal solutions generated by CP, ensuring the model learns from high-quality data.
Our hybrid approach has been extensively tested on three public FJSSP benchmarks, demonstrating superior performance over five state-of-the-art DRL approaches.
arXiv Detail & Related papers (2024-03-14T10:16:57Z) - Self-Labeling the Job Shop Scheduling Problem [15.723699332053558]
We show that generative models can be trained by sampling multiple solutions and using the best one according to the problem objective as a pseudo-label.
We prove the robustness of SLIM to various parameters and its generality by applying it to the Traveling Salesman Problem.
arXiv Detail & Related papers (2024-01-22T11:08:36Z) - Promoting Generalization for Exact Solvers via Adversarial Instance
Augmentation [62.738582127114704]
Adar is a framework for understanding and improving the generalization of both imitation-learning-based (IL-based) and reinforcement-learning-based solvers (RL-based)
arXiv Detail & Related papers (2023-10-22T03:15:36Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - ASP: Learn a Universal Neural Solver! [16.98189196303338]
We propose ASP: Adaptive Staircase Policy Space Response Oracle to address these generalization issues.
ASP consists of two components: Distributional Exploration and Persistent Scale Adaption.
Our results show that ASP can help neural solvers explore and adapt to unseen distributions and varying scales, achieving superior performance.
arXiv Detail & Related papers (2023-03-01T12:47:14Z) - Learning To Dive In Branch And Bound [95.13209326119153]
We propose L2Dive to learn specific diving structurals with graph neural networks.
We train generative models to predict variable assignments and leverage the duality of linear programs to make diving decisions.
arXiv Detail & Related papers (2023-01-24T12:01:45Z) - Learning to Solve Routing Problems via Distributionally Robust
Optimization [14.506553345693536]
Recent deep models for solving routing problems assume a single distribution of nodes for training, which severely impairs their cross-distribution generalization ability.
We exploit group distributionally robust optimization (group DRO) to tackle this issue, where we jointly optimize the weights for different groups of distributions and the parameters for the deep model in an interleaved manner during training.
We also design a module based on convolutional neural network, which allows the deep model to learn more informative latent pattern among the nodes.
arXiv Detail & Related papers (2022-02-15T08:06:44Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.