USPR: Learning a Unified Solver for Profiled Routing
- URL: http://arxiv.org/abs/2505.05119v1
- Date: Thu, 08 May 2025 10:42:57 GMT
- Title: USPR: Learning a Unified Solver for Profiled Routing
- Authors: Chuanbo Hua, Federico Berto, Zhikai Zhao, Jiwoo Son, Changhyun Kwon, Jinkyoo Park,
- Abstract summary: Profiled Vehicle Routing Problem (PVRP) incorporates vehicle-client-specific preferences and constraints.<n>Recent reinforcement learning (RL) solvers require retraining for each new profile distribution.<n>We introduce USPR, a novel framework that handles arbitrary profile types.
- Score: 15.136899433821894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Profiled Vehicle Routing Problem (PVRP) extends the classical VRP by incorporating vehicle-client-specific preferences and constraints, reflecting real-world requirements such as zone restrictions and service-level preferences. While recent reinforcement learning (RL) solvers have shown promise, they require retraining for each new profile distribution, suffer from poor representation ability, and struggle to generalize to out-of-distribution instances. In this paper, we address these limitations by introducing USPR (Unified Solver for Profiled Routing), a novel framework that natively handles arbitrary profile types. USPR introduces three key innovations: (i) Profile Embeddings (PE) to encode any combination of profile types; (ii) Multi-Head Profiled Attention (MHPA), an attention mechanism that models rich interactions between vehicles and clients; (iii) Profile-aware Score Reshaping (PSR), which dynamically adjusts decoder logits using profile scores to improve generalization. Empirical results on diverse PVRP benchmarks demonstrate that USPR achieves state-of-the-art results among learning-based methods while offering significant gains in flexibility and computational efficiency. We make our source code publicly available to foster future research at https://github.com/ai4co/uspr.
Related papers
- CAMP: Collaborative Attention Model with Profiles for Vehicle Routing Problems [15.136899433821894]
The profiled vehicle routing problem (PVRP) is a generalization of the heterogeneous capacitated vehicle routing problem (HCVRP)<n>We propose a novel approach that learns efficient solvers for PVRP using multi-agent reinforcement learning.
arXiv Detail & Related papers (2025-01-06T12:37:56Z) - Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.<n> Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.<n>We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z) - FedAli: Personalized Federated Learning Alignment with Prototype Layers for Generalized Mobile Services [9.683642138601464]
Federated Alignment (FedAli) is a prototype-based regularization technique that enhances inter-client alignment while strengthening the robustness of personalized adaptations.<n>At its core, FedAli introduces the ALignment with Prototypes layer, inspired by human memory, to enhance generalization.<n>Our experiments show that FedAli significantly enhances client generalization while preserving strong personalization in heterogeneous settings.
arXiv Detail & Related papers (2024-11-15T21:35:21Z) - Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture [9.244633039170186]
We propose a plug-and-play Entropy-based Scaling Factor (ESF) and a Distribution-Specific (DS) decoder.<n>ESF adjusts the attention weight pattern of the model towards familiar ones discovered during training when solving VRPs of varying sizes.<n>DS decoder explicitly models VRPs of multiple training distribution patterns through multiple auxiliary light decoders, expanding the model representation space.
arXiv Detail & Related papers (2024-06-10T09:03:17Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.<n>In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.<n>We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - FedP3: Federated Personalized and Privacy-friendly Network Pruning under Model Heterogeneity [82.5448598805968]
We present an effective and adaptable federated framework FedP3, representing Federated Personalized and Privacy-friendly network Pruning.
We offer a theoretical interpretation of FedP3 and its locally differential-private variant, DP-FedP3, and theoretically validate their efficiencies.
arXiv Detail & Related papers (2024-04-15T14:14:05Z) - Fine-Tuning Language Models with Reward Learning on Policy [68.70065254564642]
Reinforcement learning from human feedback (RLHF) has emerged as an effective approach to aligning large language models (LLMs) to human preferences.
Despite its popularity, (fixed) reward models may suffer from inaccurate off-distribution.
We propose reward learning on policy (RLP), an unsupervised framework that refines a reward model using policy samples to keep it on-distribution.
arXiv Detail & Related papers (2024-03-28T10:02:10Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - SplitGP: Achieving Both Generalization and Personalization in Federated
Learning [31.105681433459285]
SplitGP captures generalization and personalization capabilities for efficient inference across resource-constrained clients.
We analytically characterize the convergence behavior of SplitGP, revealing that all client models approach stationary pointsally.
Experimental results show that SplitGP outperforms existing baselines by wide margins in inference time and test accuracy for varying amounts of out-of-distribution samples.
arXiv Detail & Related papers (2022-12-16T08:37:24Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.