Related papers: Learning Large Neighborhood Search Policy for Integer Programming

Learning Large Neighborhood Search Policy for Integer Programming

URL: http://arxiv.org/abs/2111.03466v1
Date: Mon, 1 Nov 2021 09:10:49 GMT
Title: Learning Large Neighborhood Search Policy for Integer Programming
Authors: Yaoxin Wu, Wen Song, Zhiguang Cao and Jie Zhang
Abstract summary: We propose a deep reinforcement learning (RL) method to learn large neighborhood search (LNS) policy for integer programming (IP) We represent all subsets by factorizing them into binary decisions on each variable. We then design a neural network to learn policies for each variable in parallel, trained by a customized actor-critic algorithm.
Score: 14.089039170072084
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a deep reinforcement learning (RL) method to learn large neighborhood search (LNS) policy for integer programming (IP). The RL policy is trained as the destroy operator to select a subset of variables at each step, which is reoptimized by an IP solver as the repair operator. However, the combinatorial number of variable subsets prevents direct application of typical RL algorithms. To tackle this challenge, we represent all subsets by factorizing them into binary decisions on each variable. We then design a neural network to learn policies for each variable in parallel, trained by a customized actor-critic algorithm. We evaluate the proposed method on four representative IP problems. Results show that it can find better solutions than SCIP in much less time, and significantly outperform other LNS baselines with the same runtime. Moreover, these advantages notably persist when the policies generalize to larger problems. Further experiments with Gurobi also reveal that our method can outperform this state-of-the-art commercial solver within the same time limit.

Related papers

Search Strategy Generation for Branch and Bound Using Genetic Programming [0.0]
We introduce GP2S (Genetic Programming for Search Strategy), a novel machine learning approach that automatically generates a B&B search strategy. We compare our approach with the standard method of the SCIP solver, a recent graph neural network-based method, and handcrafteds. Our method is at most 2% slower than the best baseline and consistently outperforms SCIP, achieving an average speedup of 11.3%.
arXiv Detail & Related papers (2024-12-12T16:57:46Z)
Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization [17.729842629392742]
We study a Reinforcement Learning problem in which we are given a set of trajectories collected with K baseline policies. The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space.
arXiv Detail & Related papers (2024-03-28T14:34:02Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
An actor-critic algorithm with policy gradients to solve the job shop scheduling problem using deep double recurrent agents [1.3812010983144802]
We propose a deep reinforcement learning methodology for the job shop scheduling problem (JSSP) The aim is to build up a greedy-like able to learn on some distribution of JSSP instances, different in the number of jobs and machines. As expected, the model can generalize, to some extent, to larger problems or instances originated by a different distribution from the one used in training.
arXiv Detail & Related papers (2021-10-18T07:55:39Z)
Direct Random Search for Fine Tuning of Deep Reinforcement Learning Policies [5.543220407902113]
We show that a direct random search is very effective at fine-tuning DRL policies by directly optimizing them using deterministic rollouts. Our results show that this method yields more consistent and higher performing agents on the environments we tested.
arXiv Detail & Related papers (2021-09-12T20:12:46Z)
Online Sub-Sampling for Reinforcement Learning with General Function Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm. For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times. In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z)
Discovering a set of policies for the worst case reward [15.682107694476779]
We focus on the most conservative instantiation of SIPs, set-max policies (SMPs) Our main contribution is a policy iteration algorithm that builds a set of policies in order to maximize the worst-case performance of the resulting SMP on the set of tasks. We show that the worst-case performance of the resulting SMP strictly improves at each iteration, and the algorithm only stops when there does not exist a policy that leads to improved performance.
arXiv Detail & Related papers (2021-02-08T16:27:09Z)
Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z)
Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one. Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP. We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z)
Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules. This paper introduces a new meta-learning approach that discovers an entire update rule. It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z)
Reinforcement Learning for Variable Selection in a Branch and Bound Algorithm [0.10499611180329801]
We leverage patterns in real-world instances to learn from scratch a new branching strategy optimised for a given problem. We propose FMSTS, a novel Reinforcement Learning approach specifically designed for this task.
arXiv Detail & Related papers (2020-05-20T13:15:48Z)
Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments. We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.