Learning What to Defer for Maximum Independent Sets
- URL: http://arxiv.org/abs/2006.09607v2
- Date: Mon, 29 Jun 2020 06:17:07 GMT
- Title: Learning What to Defer for Maximum Independent Sets
- Authors: Sungsoo Ahn, Younggyo Seo, Jinwoo Shin
- Abstract summary: We propose a novel DRL scheme, coined learning what to defer (LwD), where the agent adaptively shrinks or stretch the number of stages by learning to distribute the element-wise decisions of the solution at each stage.
We apply the proposed framework to the maximum independent set (MIS) problem, and demonstrate its significant improvement over the current state-of-the-art DRL scheme.
- Score: 84.00112106334655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing efficient algorithms for combinatorial optimization appears
ubiquitously in various scientific fields. Recently, deep reinforcement
learning (DRL) frameworks have gained considerable attention as a new approach:
they can automate the design of a solver while relying less on sophisticated
domain knowledge of the target problem. However, the existing DRL solvers
determine the solution using a number of stages proportional to the number of
elements in the solution, which severely limits their applicability to
large-scale graphs. In this paper, we seek to resolve this issue by proposing a
novel DRL scheme, coined learning what to defer (LwD), where the agent
adaptively shrinks or stretch the number of stages by learning to distribute
the element-wise decisions of the solution at each stage. We apply the proposed
framework to the maximum independent set (MIS) problem, and demonstrate its
significant improvement over the current state-of-the-art DRL scheme. We also
show that LwD can outperform the conventional MIS solvers on large-scale graphs
having millions of vertices, under a limited time budget.
Related papers
- Multiobjective Vehicle Routing Optimization with Time Windows: A Hybrid Approach Using Deep Reinforcement Learning and NSGA-II [52.083337333478674]
This paper proposes a weight-aware deep reinforcement learning (WADRL) approach designed to address the multiobjective vehicle routing problem with time windows (MOVRPTW)
The Non-dominated sorting genetic algorithm-II (NSGA-II) method is then employed to optimize the outcomes produced by the WADRL.
arXiv Detail & Related papers (2024-07-18T02:46:06Z) - Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - Flow of Reasoning:Training LLMs for Divergent Problem Solving with Minimal Examples [12.48027669682156]
Flow of Reasoning aims to improve reasoning quality and diversity with minimal data.
FoR formulates multi-step LLM reasoning as a Markovian flow on a DAG-structured reasoning graph.
Experiments show that, with limited training examples, FoR enables the discovery of diverse, creative, high-quality solutions.
arXiv Detail & Related papers (2024-06-09T07:06:58Z) - Deep Reinforcement Learning Guided Improvement Heuristic for Job Shop
Scheduling [30.45126420996238]
This paper proposes a novel DRL-guided improvement for solving JSSP, where graph representation is employed to encode complete solutions.
We design a Graph Neural-Network-based representation scheme, consisting of two modules to effectively capture the information of dynamic topology and different types of nodes in graphs encountered during the improvement process.
We prove that our method scales linearly with problem size. Experiments on classic benchmarks show that the improvement policy learned by our method outperforms state-of-the-art DRL-based methods by a large margin.
arXiv Detail & Related papers (2022-11-20T10:20:13Z) - DIMES: A Differentiable Meta Solver for Combinatorial Optimization
Problems [41.57773395100222]
Deep reinforcement learning (DRL) models have shown promising results in solving NP-hard Combinatorial Optimization problems.
This paper addresses the scalability challenge in large-scale optimization by proposing a novel approach, namely, DIMES.
Unlike previous DRL methods which suffer from costly autoregressive decoding or iterative refinements of discrete solutions, DIMES introduces a compact continuous space for parameterizing the underlying distribution of candidate solutions.
Extensive experiments show that DIMES outperforms recent DRL-based methods on large benchmark datasets for Traveling Salesman Problems and Maximal Independent Set problems.
arXiv Detail & Related papers (2022-10-08T23:24:37Z) - Towards Deployment-Efficient Reinforcement Learning: Lower Bound and
Optimality [141.89413461337324]
Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL)
We propose a theoretical formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective.
arXiv Detail & Related papers (2022-02-14T01:31:46Z) - Reinforcement Learning for Adaptive Mesh Refinement [63.7867809197671]
We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning to train refinement policies directly from simulation.
The model sizes of these policy architectures are independent of the mesh size and hence scale to arbitrarily large and complex simulations.
arXiv Detail & Related papers (2021-03-01T22:55:48Z) - Reversible Action Design for Combinatorial Optimization with
Reinforcement Learning [35.50454156611722]
Reinforcement learning (RL) has recently emerged as a new framework to tackle these problems.
We propose a general RL framework that not only exhibits state-of-the-art empirical performance but also generalizes to a variety class of COPs.
arXiv Detail & Related papers (2021-02-14T18:05:42Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - sKPNSGA-II: Knee point based MOEA with self-adaptive angle for Mission
Planning Problems [2.191505742658975]
Some problems have many objectives which lead to a large number of non-dominated solutions.
This paper presents a new algorithm that has been designed to obtain the most significant solutions.
This new algorithm has been applied to the real world application in Unmanned Air Vehicle (UAV) Mission Planning Problem.
arXiv Detail & Related papers (2020-02-20T17:07:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.