Leader Reward for POMO-Based Neural Combinatorial Optimization
- URL: http://arxiv.org/abs/2405.13947v1
- Date: Wed, 22 May 2024 19:27:03 GMT
- Title: Leader Reward for POMO-Based Neural Combinatorial Optimization
- Authors: Chaoyang Wang, Pengzhi Cheng, Jingze Li, Weiwei Sun,
- Abstract summary: We propose Leader Reward to enhance the model's ability to generate optimal solutions.
We demonstrate that Leader Reward greatly improves the quality of the optimal solutions generated by the model.
- Score: 8.301694061287565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks based on reinforcement learning (RL) for solving combinatorial optimization (CO) problems are developing rapidly and have shown a tendency to approach or even outperform traditional solvers. However, existing methods overlook an important distinction: CO problems differ from other traditional problems in that they focus solely on the optimal solution provided by the model within a specific length of time, rather than considering the overall quality of all solutions generated by the model. In this paper, we propose Leader Reward and apply it during two different training phases of the Policy Optimization with Multiple Optima (POMO) model to enhance the model's ability to generate optimal solutions. This approach is applicable to a variety of CO problems, such as the Traveling Salesman Problem (TSP), the Capacitated Vehicle Routing Problem (CVRP), and the Flexible Flow Shop Problem (FFSP), but also works well with other POMO-based models or inference phase's strategies. We demonstrate that Leader Reward greatly improves the quality of the optimal solutions generated by the model. Specifically, we reduce the POMO's gap to the optimum by more than 100 times on TSP100 with almost no additional computational overhead.
Related papers
- DiffSG: A Generative Solver for Network Optimization with Diffusion Model [75.27274046562806]
Diffusion generative models can consider a broader range of solutions and exhibit stronger generalization by learning parameters.
We propose a new framework, which leverages intrinsic distribution learning of diffusion generative models to learn high-quality solutions.
arXiv Detail & Related papers (2024-08-13T07:56:21Z) - Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - Instance-Conditioned Adaptation for Large-scale Generalization of Neural Combinatorial Optimization [15.842155380912002]
This work proposes a novel Instance-Conditioned Adaptation Model (ICAM) for better large-scale generalization of neural optimization.
In particular, we design a powerful yet lightweight instance-conditioned Routing adaptation module for the NCO model.
We develop an efficient three-stage reinforcement learning-based training scheme that enables the model to learn cross-scale features without any labeled optimal solution.
arXiv Detail & Related papers (2024-05-03T08:00:19Z) - Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale
Generalization [15.189182646851865]
We propose a novel Light and Heavy Decoder (LEHD) model with a strong generalization ability to address this critical issue.
We develop a data-efficient training scheme and a flexible solution mechanism for the proposed LEHD model.
Our results confirm our proposed LEHD model can significantly improve the state-of-the-art performance for constructive NCO.
arXiv Detail & Related papers (2023-10-12T02:18:50Z) - Backpropagation of Unrolled Solvers with Folded Optimization [55.04219793298687]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver.
This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation.
arXiv Detail & Related papers (2023-01-28T01:50:42Z) - Learning to Optimize Permutation Flow Shop Scheduling via Graph-based
Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems.
We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately.
Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z) - DIMES: A Differentiable Meta Solver for Combinatorial Optimization
Problems [41.57773395100222]
Deep reinforcement learning (DRL) models have shown promising results in solving NP-hard Combinatorial Optimization problems.
This paper addresses the scalability challenge in large-scale optimization by proposing a novel approach, namely, DIMES.
Unlike previous DRL methods which suffer from costly autoregressive decoding or iterative refinements of discrete solutions, DIMES introduces a compact continuous space for parameterizing the underlying distribution of candidate solutions.
Extensive experiments show that DIMES outperforms recent DRL-based methods on large benchmark datasets for Traveling Salesman Problems and Maximal Independent Set problems.
arXiv Detail & Related papers (2022-10-08T23:24:37Z) - Multi-Agent Deep Reinforcement Learning in Vehicular OCC [14.685237010856953]
We introduce a spectral efficiency optimization approach in vehicular OCC.
We model the optimization problem as a Markov decision process (MDP) to enable the use of solutions that can be applied online.
We verify the performance of our proposed scheme through extensive simulations and compare it with various variants of our approach and a random method.
arXiv Detail & Related papers (2022-05-05T14:25:54Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - POMO: Policy Optimization with Multiple Optima for Reinforcement
Learning [8.819672165548477]
We introduce Policy Optimization with Multiple Optima (POMO), an end-to-end approach for building such a solver.
POMO is applicable to a wide range of CO problems. It is designed to exploit the symmetries in the representation of a CO solution.
We demonstrate the effectiveness of POMO by solving three popular NP-hard problems, namely, traveling salesman (TSP), capacitated vehicle routing (CVRP), and 0-1 knapsack (KP)
arXiv Detail & Related papers (2020-10-30T00:57:50Z) - Automatically Learning Compact Quality-aware Surrogates for Optimization
Problems [55.94450542785096]
Solving optimization problems with unknown parameters requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values.
Recent work has shown that including the optimization problem as a layer in a complex training model pipeline results in predictions of iteration of unobserved decision making.
We show that we can improve solution quality by learning a low-dimensional surrogate model of a large optimization problem.
arXiv Detail & Related papers (2020-06-18T19:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.