Combinatorial Optimization with Policy Adaptation using Latent Space Search
- URL: http://arxiv.org/abs/2311.13569v2
- Date: Tue, 28 May 2024 14:22:20 GMT
- Title: Combinatorial Optimization with Policy Adaptation using Latent Space Search
- Authors: Felix Chalumeau, Shikha Surana, Clement Bonnet, Nathan Grinsztajn, Arnu Pretorius, Alexandre Laterre, Thomas D. Barrett,
- Abstract summary: We present a novel approach for designing performant algorithms to solve complex, typically NP-hard, problems.
We show that our search strategy outperforms state-of-the-art approaches on 11 standard benchmarking tasks.
- Score: 44.12073954093942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Combinatorial Optimization underpins many real-world applications and yet, designing performant algorithms to solve these complex, typically NP-hard, problems remains a significant research challenge. Reinforcement Learning (RL) provides a versatile framework for designing heuristics across a broad spectrum of problem domains. However, despite notable progress, RL has not yet supplanted industrial solvers as the go-to solution. Current approaches emphasize pre-training heuristics that construct solutions but often rely on search procedures with limited variance, such as stochastically sampling numerous solutions from a single policy or employing computationally expensive fine-tuning of the policy on individual problem instances. Building on the intuition that performant search at inference time should be anticipated during pre-training, we propose COMPASS, a novel RL approach that parameterizes a distribution of diverse and specialized policies conditioned on a continuous latent space. We evaluate COMPASS across three canonical problems - Travelling Salesman, Capacitated Vehicle Routing, and Job-Shop Scheduling - and demonstrate that our search strategy (i) outperforms state-of-the-art approaches on 11 standard benchmarking tasks and (ii) generalizes better, surpassing all other approaches on a set of 18 procedurally transformed instance distributions.
Related papers
- Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control
Approach [0.3093890460224435]
We address the solution of the popular Wordle puzzle, using new reinforcement learning methods.
For the Wordle puzzle, they yield on-line solution strategies that are very close to optimal at relatively modest computational cost.
arXiv Detail & Related papers (2022-11-15T03:46:41Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Sample-Efficient, Exploration-Based Policy Optimisation for Routing
Problems [2.6782615615913348]
This paper presents a new reinforcement learning approach that is based on entropy.
In addition, we design an off-policy-based reinforcement learning technique that maximises the expected return.
We show that our model can generalise to various route problems.
arXiv Detail & Related papers (2022-05-31T09:51:48Z) - Efficient lifting of symmetry breaking constraints for complex
combinatorial problems [9.156939957189502]
This work extends the learning framework and implementation of a model-based approach for Answer Set Programming.
In particular, we incorporate a new conflict analysis algorithm in the Inductive Logic Programming system ILASP.
arXiv Detail & Related papers (2022-05-14T20:42:13Z) - Learning Proximal Operators to Discover Multiple Optima [66.98045013486794]
We present an end-to-end method to learn the proximal operator across non-family problems.
We show that for weakly-ized objectives and under mild conditions, the method converges globally.
arXiv Detail & Related papers (2022-01-28T05:53:28Z) - Deep Policy Iteration with Integer Programming for Inventory Management [8.27175065641495]
We present a framework for optimizing long-term discounted reward problems with large accessible action space and state dependent constraints.
Our proposed Programmable Actor Reinforcement Learning (PARL) uses a deep-policy method that leverages neural networks (NNs) to approximate the value function.
We benchmark the proposed algorithm against state-of-the-art RL algorithms and commonly used replenishments and find it considerably outperforms existing methods by as much as 14.7% on average.
arXiv Detail & Related papers (2021-12-04T01:40:34Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Reinforcement Learning for Combinatorial Optimization: A Survey [12.323976053967066]
Many traditional algorithms for solving optimization problems involve using hand-crafteds that sequentially construct a solution.
Reinforcement learning (RL) proposes a good alternative to automate the search of theses by training an agent in a supervised or self-supervised manner.
arXiv Detail & Related papers (2020-03-07T16:19:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.