Related papers: End-to-end Deep Reinforcement Learning for Stochastic Multi-objective Optimization in C-VRPTW

End-to-end Deep Reinforcement Learning for Stochastic Multi-objective Optimization in C-VRPTW

URL: http://arxiv.org/abs/2512.01518v1
Date: Mon, 01 Dec 2025 10:43:27 GMT
Title: End-to-end Deep Reinforcement Learning for Stochastic Multi-objective Optimization in C-VRPTW
Authors: Abdo Abouelrous, Laurens Bliek, Yaoxin Wu, Yingqian Zhang,
Abstract summary: We consider learning-based applications in routing to solve a Vehicle variant characterized by earnestness and multiple objectives.<n>We specifically consider travel time uncertainty. We also consider two objectives, total travel time and route makespan, that jointly target operational efficiency and labor regulations on shift length.<n>We propose a model that simultaneously addresses earnestness and multi-objectivity and provide a refined training mechanism for this model through scenario clustering.
Score: 15.392818864851654
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we consider learning-based applications in routing to solve a Vehicle Routing variant characterized by stochasticity and multiple objectives. Such problems are representative of practical settings where decision-makers have to deal with uncertainty in the operational environment as well as multiple conflicting objectives due to different stakeholders. We specifically consider travel time uncertainty. We also consider two objectives, total travel time and route makespan, that jointly target operational efficiency and labor regulations on shift length, although different objectives could be incorporated. Learning-based methods offer earnest computational advantages as they can repeatedly solve problems with limited interference from the decision-maker. We specifically focus on end-to-end deep learning models that leverage the attention mechanism and multiple solution trajectories. These models have seen several successful applications in routing problems. However, since travel times are not a direct input to these models due to the large dimensions of the travel time matrix, accounting for uncertainty is a challenge, especially in the presence of multiple objectives. In turn, we propose a model that simultaneously addresses stochasticity and multi-objectivity and provide a refined training mechanism for this model through scenario clustering to reduce training time. Our results show that our model is capable of constructing a Pareto Front of good quality within acceptable run times compared to three baselines.

Related papers

MO-MIX: Multi-Objective Multi-Agent Cooperative Decision-Making With Deep Reinforcement Learning [68.91090643731987]
Deep reinforcement learning (RL) has been applied extensively to solve complex decision-making problems.<n>Existing approaches are limited to separate fields and can only handle multi-agent decision-making with a single objective.<n>We propose MO-mix to solve the multi-objective multi-agent reinforcement learning (MOMARL) problem.
arXiv Detail & Related papers (2026-02-28T16:25:22Z)
TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning [24.158051656957166]
Current autonomous driving systems often favor end-to-end frameworks, which take sensor inputs like images and learn to map them into trajectory space via neural networks.<n>Previous work has demonstrated that models can achieve better planning performance when provided with a prior distribution of possible trajectories.
arXiv Detail & Related papers (2025-12-08T03:40:10Z)
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners [60.75160178669076]
We show that the use of high-capacity value models trained via cross-entropy and conditioned on learnable task embeddings addresses the problem of task interference in online reinforcement learning.<n>We test our approach on 7 multi-task benchmarks with over 280 unique tasks, spanning high degree-of-freedom humanoid control and discrete vision-based RL.
arXiv Detail & Related papers (2025-05-29T06:41:45Z)
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [79.2162092822111]
We systematically evaluate reinforcement learning (RL) and control-based methods on a suite of navigation tasks.<n>We employ a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning.<n>Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts.
arXiv Detail & Related papers (2025-02-20T18:39:41Z)
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost. We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion. By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z)
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)<n>MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved.<n>We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z)
Many-Objective Multi-Solution Transport [36.07360460509921]
Many-objective multi-solution Transport (MosT) is a framework that finds multiple diverse solutions in the Pareto front of many objectives. MosT formulates the problem as a bi-level optimization of weighted objectives for each solution, where the weights are defined by an optimal transport between the objectives and solutions.
arXiv Detail & Related papers (2024-03-06T23:03:12Z)
Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies [0.9208007322096532]
This study presents a comparative analysis between single-objective and multi-objective reinforcement learning methods for training a robot to navigate effectively to an end goal. By modifying the reward function to return a vector of rewards, each pertaining to a distinct objective, the robot learns a policy that effectively balances the different goals.
arXiv Detail & Related papers (2023-12-13T08:00:26Z)
Multi-Target Multiplicity: Flexibility and Fairness in Target Specification under Resource Constraints [76.84999501420938]
We introduce a conceptual and computational framework for assessing how the choice of target affects individuals' outcomes. We show that the level of multiplicity that stems from target variable choice can be greater than that stemming from nearly-optimal models of a single target.
arXiv Detail & Related papers (2023-06-23T18:57:14Z)
Gradient Optimization for Single-State RMDPs [0.0]
Modern problems such as autonomous driving, control of robotic components, and medical diagnostics have become increasingly difficult to solve analytically. Data-driven solutions are a strong option where there are problems with more dimensions of complexity than can be understood by people. Unfortunately, data-driven models often come with uncertainty in how they will perform in the worst of scenarios. In fields such as autonomous driving and medicine, the consequences of these failures could be catastrophic.
arXiv Detail & Related papers (2022-09-25T18:50:02Z)
A Distributional View on Multi-Objective Policy Optimization [24.690800846837273]
We propose an algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way. We show that setting different preferences in our framework allows us to trace out the space of nondominated solutions.
arXiv Detail & Related papers (2020-05-15T13:02:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.