Deep Reinforcement Learning for Exact Combinatorial Optimization:
Learning to Branch
- URL: http://arxiv.org/abs/2206.06965v1
- Date: Tue, 14 Jun 2022 16:35:58 GMT
- Title: Deep Reinforcement Learning for Exact Combinatorial Optimization:
Learning to Branch
- Authors: Tianyu Zhang, Amin Banitalebi-Dehkordi, and Yong Zhang
- Abstract summary: We propose a new approach for solving the data labeling and inference issues in optimization based on the use of the reinforcement learning (RL) paradigm.
We use imitation learning to bootstrap an RL agent and then use Proximal Policy (PPO) to further explore global optimal actions.
- Score: 13.024115985194932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Branch-and-bound is a systematic enumerative method for combinatorial
optimization, where the performance highly relies on the variable selection
strategy. State-of-the-art handcrafted heuristic strategies suffer from
relatively slow inference time for each selection, while the current machine
learning methods require a significant amount of labeled data. We propose a new
approach for solving the data labeling and inference latency issues in
combinatorial optimization based on the use of the reinforcement learning (RL)
paradigm. We use imitation learning to bootstrap an RL agent and then use
Proximal Policy Optimization (PPO) to further explore global optimal actions.
Then, a value network is used to run Monte-Carlo tree search (MCTS) to enhance
the policy network. We evaluate the performance of our method on four different
categories of combinatorial optimization problems and show that our approach
performs strongly compared to the state-of-the-art machine learning and
heuristics based methods.
Related papers
- Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts.
We identify two pivotal factors in model parameter learning: update direction and update method.
In particular, we borrow the theoretical framework and learning methods from gradient-based optimization to design improved strategies.
arXiv Detail & Related papers (2024-02-27T15:05:32Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks.
We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations.
We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z) - Online Control of Adaptive Large Neighborhood Search using Deep Reinforcement Learning [4.374837991804085]
We introduce a Deep Reinforcement Learning based approach called DR-ALNS that selects operators, adjusts parameters, and controls the acceptance criterion throughout the search.
We evaluate the proposed method on a problem with orienteering weights and time windows, as presented in an IJCAI competition.
The results show that our approach outperforms vanilla ALNS, ALNS tuned with Bayesian optimization, and two state-of-the-art DRL approaches.
arXiv Detail & Related papers (2022-11-01T21:33:46Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Sample-Efficient, Exploration-Based Policy Optimisation for Routing
Problems [2.6782615615913348]
This paper presents a new reinforcement learning approach that is based on entropy.
In addition, we design an off-policy-based reinforcement learning technique that maximises the expected return.
We show that our model can generalise to various route problems.
arXiv Detail & Related papers (2022-05-31T09:51:48Z) - Reinforcement Learning for Branch-and-Bound Optimisation using
Retrospective Trajectories [72.15369769265398]
Machine learning has emerged as a promising paradigm for branching.
We propose retro branching; a simple yet effective approach to RL for branching.
We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables.
arXiv Detail & Related papers (2022-05-28T06:08:07Z) - Automatic tuning of hyper-parameters of reinforcement learning
algorithms using Bayesian optimization with behavioral cloning [0.0]
In reinforcement learning (RL), the information content of data gathered by the learning agent is dependent on the setting of many hyper- parameters.
In this work, a novel approach for autonomous hyper- parameter setting using Bayesian optimization is proposed.
Experiments reveal promising results compared to other manual tweaking and optimization-based approaches.
arXiv Detail & Related papers (2021-12-15T13:10:44Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Meta Learning Black-Box Population-Based Optimizers [0.0]
We propose the use of meta-learning to infer population-based blackbox generalizations.
We show that the meta-loss function encourages a learned algorithm to alter its search behavior so that it can easily fit into a new context.
arXiv Detail & Related papers (2021-03-05T08:13:25Z) - Reinforcement Learning for Variable Selection in a Branch and Bound
Algorithm [0.10499611180329801]
We leverage patterns in real-world instances to learn from scratch a new branching strategy optimised for a given problem.
We propose FMSTS, a novel Reinforcement Learning approach specifically designed for this task.
arXiv Detail & Related papers (2020-05-20T13:15:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.