Multi-Task Vehicle Routing Solver via Mixture of Specialized Experts under State-Decomposable MDP
- URL: http://arxiv.org/abs/2510.21453v1
- Date: Fri, 24 Oct 2025 13:31:31 GMT
- Title: Multi-Task Vehicle Routing Solver via Mixture of Specialized Experts under State-Decomposable MDP
- Authors: Yuxin Pan, Zhiguang Cao, Chengyang Gu, Liu Liu, Peilin Zhao, Yize Chen, Fangzhen Lin,
- Abstract summary: We propose a framework that enables unified solvers to perceive the shared-component nature across VRP variants.<n>We introduce a State-Decomposable MDP (SDMDP) that reformulates VRPs by expressing the state space as the Cartesian product of basis state spaces.<n>A Latent Space-based SDMDP extension is developed by incorporating both the optimal basis policies and a learnable mixture function.
- Score: 57.28979643999352
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing neural methods for multi-task vehicle routing problems (VRPs) typically learn unified solvers to handle multiple constraints simultaneously. However, they often underutilize the compositional structure of VRP variants, each derivable from a common set of basis VRP variants. This critical oversight causes unified solvers to miss out the potential benefits of basis solvers, each specialized for a basis VRP variant. To overcome this limitation, we propose a framework that enables unified solvers to perceive the shared-component nature across VRP variants by proactively reusing basis solvers, while mitigating the exponential growth of trained neural solvers. Specifically, we introduce a State-Decomposable MDP (SDMDP) that reformulates VRPs by expressing the state space as the Cartesian product of basis state spaces associated with basis VRP variants. More crucially, this formulation inherently yields the optimal basis policy for each basis VRP variant. Furthermore, a Latent Space-based SDMDP extension is developed by incorporating both the optimal basis policies and a learnable mixture function to enable the policy reuse in the latent space. Under mild assumptions, this extension provably recovers the optimal unified policy of SDMDP through the mixture function that computes the state embedding as a mapping from the basis state embeddings generated by optimal basis policies. For practical implementation, we introduce the Mixture-of-Specialized-Experts Solver (MoSES), which realizes basis policies through specialized Low-Rank Adaptation (LoRA) experts, and implements the mixture function via an adaptive gating mechanism. Extensive experiments conducted across VRP variants showcase the superiority of MoSES over prior methods.
Related papers
- Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution [76.66229730098759]
In real-world image super-resolution (Real-ISR), existing approaches mainly rely on fine-tuning pre-trained diffusion models.<n>We propose a Mixture-of-Ranks (MoR) architecture for single-step image super-resolution.<n>We introduce a fine-grained expert partitioning strategy that treats each rank in LoRA as an independent expert.
arXiv Detail & Related papers (2025-11-20T04:11:44Z) - Learning Branching Policies for MILPs with Proximal Policy Optimization [0.0]
Branch-and-Bound (B&B) is the dominant exact solution method for Mixed Linear Programs (MILP)<n>Current approaches rely on Imitation Learning (IL), which tends to overfit to expert demonstrations and struggles to generalize to structurally diverse or unseen instances.<n>In this work, we propose Tree-Gate Proximal Policy Optimization, a novel framework that employs Proximal Policy Optimization (PPO), a Reinforcement Learning (RL) algorithm, to train a branching policy.
arXiv Detail & Related papers (2025-11-17T05:16:14Z) - Neural Index Policies for Restless Multi-Action Bandits with Heterogeneous Budgets [2.9059410824803655]
We introduce a Neural Index Policy (NIP) for multi-action RMABs with heterogeneous budget constraints.<n>NIP unifies index prediction and constrained optimization in a single end-to-end differentiable framework.<n> Empirically, NIP achieves near-optimal performance within 5% of the occupancy oracle-measure policy.
arXiv Detail & Related papers (2025-10-24T23:08:36Z) - Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation [10.35045003737115]
Decision-making under distribution shift is a central challenge in reinforcement learning (RL), where training and deployment environments differ.<n>We propose DR-RPO, a model-free online policy optimization method that learns robust policies with sublinear regret.<n>We show that DR-RPO can achieve suboptimality bounds and sample efficiency in robust RL, matching the performance of value-based approaches.
arXiv Detail & Related papers (2025-10-16T02:56:58Z) - Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards [47.557539197058496]
We introduce Random Policy Valuation for Diverse Reasoning (ROVER)<n>ROVER is a minimalist yet highly effective RL method that samples actions from a softmax over uniform-policy Q-values.<n>It demonstrates superior performance in both textbfquality (textbf+8.2 on pass@1, textbf+16.8 on pass@256) and textbfdiversity (textbf+17.6%)
arXiv Detail & Related papers (2025-09-29T16:09:07Z) - Towards Generalizable Neural Solvers for Vehicle Routing Problems via Ensemble with Transferrable Local Policy [24.91781032046481]
Many neural construction methods for Vehicle Routing Problems(VRPs) focus on synthetic problem instances with specified node distributions and limited scales.
We design an auxiliary policy that learns from the local transferable topological features, named local policy, and integrate it with a typical construction policy to form an ensemble policy.
With joint training, the aggregated policies perform cooperatively and complementarily to boost generalization.
arXiv Detail & Related papers (2023-08-27T13:22:50Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z) - Reinforcement Learning for Adaptive Mesh Refinement [63.7867809197671]
We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning to train refinement policies directly from simulation.
The model sizes of these policy architectures are independent of the mesh size and hence scale to arbitrarily large and complex simulations.
arXiv Detail & Related papers (2021-03-01T22:55:48Z) - Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.
We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.