POMO: Policy Optimization with Multiple Optima for Reinforcement
Learning
- URL: http://arxiv.org/abs/2010.16011v3
- Date: Tue, 13 Jul 2021 05:20:17 GMT
- Title: POMO: Policy Optimization with Multiple Optima for Reinforcement
Learning
- Authors: Yeong-Dae Kwon, Jinho Choo, Byoungjip Kim, Iljoo Yoon, Youngjune Gwon,
Seungjai Min
- Abstract summary: We introduce Policy Optimization with Multiple Optima (POMO), an end-to-end approach for building such a solver.
POMO is applicable to a wide range of CO problems. It is designed to exploit the symmetries in the representation of a CO solution.
We demonstrate the effectiveness of POMO by solving three popular NP-hard problems, namely, traveling salesman (TSP), capacitated vehicle routing (CVRP), and 0-1 knapsack (KP)
- Score: 8.819672165548477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In neural combinatorial optimization (CO), reinforcement learning (RL) can
turn a deep neural net into a fast, powerful heuristic solver of NP-hard
problems. This approach has a great potential in practical applications because
it allows near-optimal solutions to be found without expert guides armed with
substantial domain knowledge. We introduce Policy Optimization with Multiple
Optima (POMO), an end-to-end approach for building such a heuristic solver.
POMO is applicable to a wide range of CO problems. It is designed to exploit
the symmetries in the representation of a CO solution. POMO uses a modified
REINFORCE algorithm that forces diverse rollouts towards all optimal solutions.
Empirically, the low-variance baseline of POMO makes RL training fast and
stable, and it is more resistant to local minima compared to previous
approaches. We also introduce a new augmentation-based inference method, which
accompanies POMO nicely. We demonstrate the effectiveness of POMO by solving
three popular NP-hard problems, namely, traveling salesman (TSP), capacitated
vehicle routing (CVRP), and 0-1 knapsack (KP). For all three, our solver based
on POMO shows a significant improvement in performance over all recent learned
heuristics. In particular, we achieve the optimality gap of 0.14% with TSP100
while reducing inference time by more than an order of magnitude.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Leader Reward for POMO-Based Neural Combinatorial Optimization [8.301694061287565]
We propose Leader Reward to enhance the model's ability to generate optimal solutions.
We demonstrate that Leader Reward greatly improves the quality of the optimal solutions generated by the model.
arXiv Detail & Related papers (2024-05-22T19:27:03Z) - Instance-Conditioned Adaptation for Large-scale Generalization of Neural Combinatorial Optimization [15.842155380912002]
This work proposes a novel Instance-Conditioned Adaptation Model (ICAM) for better large-scale generalization of neural optimization.
In particular, we design a powerful yet lightweight instance-conditioned Routing adaptation module for the NCO model.
We develop an efficient three-stage reinforcement learning-based training scheme that enables the model to learn cross-scale features without any labeled optimal solution.
arXiv Detail & Related papers (2024-05-03T08:00:19Z) - An Efficient Learning-based Solver Comparable to Metaheuristics for the
Capacitated Arc Routing Problem [67.92544792239086]
We introduce an NN-based solver to significantly narrow the gap with advanced metaheuristics.
First, we propose direction-aware facilitating attention model (DaAM) to incorporate directionality into the embedding process.
Second, we design a supervised reinforcement learning scheme that involves supervised pre-training to establish a robust initial policy.
arXiv Detail & Related papers (2024-03-11T02:17:42Z) - DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization [51.517956081644186]
We introduce a new graph-based diffusion framework, namely DIFUSCO.
Our framework casts NPC problems as discrete 0, 1-vector optimization problems.
For the MIS problem, DIFUSCO outperforms the previous state-of-the-art neural solver on the challenging SATLIB benchmark.
arXiv Detail & Related papers (2023-02-16T11:13:36Z) - Learning Adaptive Evolutionary Computation for Solving Multi-Objective
Optimization Problems [3.3266268089678257]
This paper proposes a framework that integrates MOEAs with adaptive parameter control using Deep Reinforcement Learning (DRL)
The DRL policy is trained to adaptively set the values that dictate the intensity and probability of mutation for solutions during optimization.
We show the learned policy is transferable, i.e., the policy trained on a simple benchmark problem can be directly applied to solve the complex warehouse optimization problem.
arXiv Detail & Related papers (2022-11-01T22:08:34Z) - Learning to Optimize Permutation Flow Shop Scheduling via Graph-based
Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems.
We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately.
Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z) - Multi-Agent Deep Reinforcement Learning in Vehicular OCC [14.685237010856953]
We introduce a spectral efficiency optimization approach in vehicular OCC.
We model the optimization problem as a Markov decision process (MDP) to enable the use of solutions that can be applied online.
We verify the performance of our proposed scheme through extensive simulations and compare it with various variants of our approach and a random method.
arXiv Detail & Related papers (2022-05-05T14:25:54Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - A Hybrid Framework Using a QUBO Solver For Permutation-Based
Combinatorial Optimization [5.460573052311485]
We propose a hybrid framework to solve large-scale permutation-based problems using a high-performance quadratic unconstrained binary optimization solver.
We propose techniques to overcome the challenges in using a QUBO solver that typically comes with limited numbers of bits.
arXiv Detail & Related papers (2020-09-27T07:15:25Z) - Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations.
Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization.
It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.