Learning Large Neighborhood Search Policy for Integer Programming
- URL: http://arxiv.org/abs/2111.03466v1
- Date: Mon, 1 Nov 2021 09:10:49 GMT
- Title: Learning Large Neighborhood Search Policy for Integer Programming
- Authors: Yaoxin Wu, Wen Song, Zhiguang Cao and Jie Zhang
- Abstract summary: We propose a deep reinforcement learning (RL) method to learn large neighborhood search (LNS) policy for integer programming (IP)
We represent all subsets by factorizing them into binary decisions on each variable.
We then design a neural network to learn policies for each variable in parallel, trained by a customized actor-critic algorithm.
- Score: 14.089039170072084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a deep reinforcement learning (RL) method to learn large
neighborhood search (LNS) policy for integer programming (IP). The RL policy is
trained as the destroy operator to select a subset of variables at each step,
which is reoptimized by an IP solver as the repair operator. However, the
combinatorial number of variable subsets prevents direct application of typical
RL algorithms. To tackle this challenge, we represent all subsets by
factorizing them into binary decisions on each variable. We then design a
neural network to learn policies for each variable in parallel, trained by a
customized actor-critic algorithm. We evaluate the proposed method on four
representative IP problems. Results show that it can find better solutions than
SCIP in much less time, and significantly outperform other LNS baselines with
the same runtime. Moreover, these advantages notably persist when the policies
generalize to larger problems. Further experiments with Gurobi also reveal that
our method can outperform this state-of-the-art commercial solver within the
same time limit.
Related papers
- Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization [17.729842629392742]
We study a Reinforcement Learning problem in which we are given a set of trajectories collected with K baseline policies.
The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space.
arXiv Detail & Related papers (2024-03-28T14:34:02Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - An actor-critic algorithm with policy gradients to solve the job shop
scheduling problem using deep double recurrent agents [1.3812010983144802]
We propose a deep reinforcement learning methodology for the job shop scheduling problem (JSSP)
The aim is to build up a greedy-like able to learn on some distribution of JSSP instances, different in the number of jobs and machines.
As expected, the model can generalize, to some extent, to larger problems or instances originated by a different distribution from the one used in training.
arXiv Detail & Related papers (2021-10-18T07:55:39Z) - Direct Random Search for Fine Tuning of Deep Reinforcement Learning
Policies [5.543220407902113]
We show that a direct random search is very effective at fine-tuning DRL policies by directly optimizing them using deterministic rollouts.
Our results show that this method yields more consistent and higher performing agents on the environments we tested.
arXiv Detail & Related papers (2021-09-12T20:12:46Z) - Online Sub-Sampling for Reinforcement Learning with General Function
Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm.
For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times.
In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z) - Discovering a set of policies for the worst case reward [15.682107694476779]
We focus on the most conservative instantiation of SIPs, set-max policies (SMPs)
Our main contribution is a policy iteration algorithm that builds a set of policies in order to maximize the worst-case performance of the resulting SMP on the set of tasks.
We show that the worst-case performance of the resulting SMP strictly improves at each iteration, and the algorithm only stops when there does not exist a policy that leads to improved performance.
arXiv Detail & Related papers (2021-02-08T16:27:09Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z) - Reinforcement Learning for Variable Selection in a Branch and Bound
Algorithm [0.10499611180329801]
We leverage patterns in real-world instances to learn from scratch a new branching strategy optimised for a given problem.
We propose FMSTS, a novel Reinforcement Learning approach specifically designed for this task.
arXiv Detail & Related papers (2020-05-20T13:15:48Z) - Learning Adaptive Exploration Strategies in Dynamic Environments Through
Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments.
We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.