Reinforcement Learning with Combinatorial Actions: An Application to
Vehicle Routing
- URL: http://arxiv.org/abs/2010.12001v1
- Date: Thu, 22 Oct 2020 19:32:21 GMT
- Title: Reinforcement Learning with Combinatorial Actions: An Application to
Vehicle Routing
- Authors: Arthur Delarue, Ross Anderson, Christian Tjandraatmadja
- Abstract summary: We develop a framework for value-function-based deep reinforcement learning with a reinforcement action space.
We present an application of this framework to the capacitated vehicle routing problem (CVRP)
On each instance, we model an action as the construction of a single route, and consider a deterministic policy which is improved through a simple policy algorithm.
- Score: 9.995347522610674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Value-function-based methods have long played an important role in
reinforcement learning. However, finding the best next action given a value
function of arbitrary complexity is nontrivial when the action space is too
large for enumeration. We develop a framework for value-function-based deep
reinforcement learning with a combinatorial action space, in which the action
selection problem is explicitly formulated as a mixed-integer optimization
problem. As a motivating example, we present an application of this framework
to the capacitated vehicle routing problem (CVRP), a combinatorial optimization
problem in which a set of locations must be covered by a single vehicle with
limited capacity. On each instance, we model an action as the construction of a
single route, and consider a deterministic policy which is improved through a
simple policy iteration algorithm. Our approach is competitive with other
reinforcement learning methods and achieves an average gap of 1.7% with
state-of-the-art OR methods on standard library instances of medium size.
Related papers
- UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - An Efficient Learning-based Solver Comparable to Metaheuristics for the
Capacitated Arc Routing Problem [67.92544792239086]
We introduce an NN-based solver to significantly narrow the gap with advanced metaheuristics.
First, we propose direction-aware facilitating attention model (DaAM) to incorporate directionality into the embedding process.
Second, we design a supervised reinforcement learning scheme that involves supervised pre-training to establish a robust initial policy.
arXiv Detail & Related papers (2024-03-11T02:17:42Z) - Towards Omni-generalizable Neural Methods for Vehicle Routing Problems [14.210085924625705]
This paper studies a challenging yet realistic setting, which considers generalization across both size and distribution in VRPs.
We propose a generic meta-learning framework, which enables effective training of an model with the capability of fast adaptation to new tasks during inference.
arXiv Detail & Related papers (2023-05-31T06:14:34Z) - Efficient Planning in Combinatorial Action Spaces with Applications to
Cooperative Multi-Agent Reinforcement Learning [16.844525262228103]
In cooperative multi-agent reinforcement learning, a potentially large number of agents jointly optimize a global reward function, which leads to a blow-up in the action space by the number of agents.
As a minimal requirement, we assume access to an argmax oracle that allows to efficiently compute the greedy policy for any Q-function in the model class.
We propose efficient algorithms for this setting that lead to compute and query complexity in all relevant problem parameters.
arXiv Detail & Related papers (2023-02-08T23:42:49Z) - Supervised Permutation Invariant Networks for Solving the CVRP with
Bounded Fleet Size [3.5235974685889397]
Learning to solve optimization problems, such as the vehicle routing problem, offers great computational advantages.
We propose a powerful supervised deep learning framework that constructs a complete tour plan from scratch while respecting an apriori fixed number of vehicles.
In combination with an efficient post-processing scheme, our supervised approach is not only much faster and easier to train but also competitive results.
arXiv Detail & Related papers (2022-01-05T10:32:18Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space.
We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function.
We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z) - Learning Vehicle Routing Problems using Policy Optimisation [4.093722933440819]
State-of-the-art approaches learn a policy using reinforcement learning, and the learnt policy acts as a pseudo solver.
These approaches have demonstrated good performance in some cases, but given the large search space typical of routing problem, they can converge too quickly to poor policy.
We propose entropy regularised reinforcement learning (ERRL) that supports exploration by providing more policies.
arXiv Detail & Related papers (2020-12-24T14:18:56Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - An Online Method for A Class of Distributionally Robust Optimization
with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems.
Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z) - Lane-Merging Using Policy-based Reinforcement Learning and
Post-Optimization [0.0]
We combine policy-based reinforcement learning with local optimization to foster and synthesize the best of the two methodologies.
We evaluate the proposed method using lane-change scenarios with a varying number of vehicles.
arXiv Detail & Related papers (2020-03-06T12:57:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.