Related papers: Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture

Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture

URL: http://arxiv.org/abs/2406.06652v2
Date: Mon, 17 Jun 2024 14:02:57 GMT
Title: Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture
Authors: Yubin Xiao, Di Wang, Xuan Wu, Yuesong Wu, Boyang Li, Wei Du, Liupu Wang, You Zhou,
Abstract summary: We propose a plug-and-play Entropy-based Scaling Factor (ESF) and a Distribution-Specific (DS) decoder. ESF adjusts the attention weight pattern of the model towards familiar ones discovered during training when solving VRPs of varying sizes. DS decoder explicitly models VRPs of multiple training distribution patterns through multiple auxiliary light decoders, expanding the model representation space.
Score: 9.244633039170186
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural models produce promising results when solving Vehicle Routing Problems (VRPs), but often fall short in generalization. Recent attempts to enhance model generalization often incur unnecessarily large training cost or cannot be directly applied to other models solving different VRP variants. To address these issues, we take a novel perspective on model architecture in this study. Specifically, we propose a plug-and-play Entropy-based Scaling Factor (ESF) and a Distribution-Specific (DS) decoder to enhance the size and distribution generalization, respectively. ESF adjusts the attention weight pattern of the model towards familiar ones discovered during training when solving VRPs of varying sizes. The DS decoder explicitly models VRPs of multiple training distribution patterns through multiple auxiliary light decoders, expanding the model representation space to encompass a broader range of distributional scenarios. We conduct extensive experiments on both synthetic and widely recognized real-world benchmarking datasets and compare the performance with seven baseline models. The results demonstrate the effectiveness of using ESF and DS decoder to obtain a more generalizable model and showcase their applicability to solve different VRP variants, i.e., travelling salesman problem and capacitated VRP. Notably, our proposed generic components require minimal computational resources, and can be effortlessly integrated into conventional generalization strategies to further elevate model generalization.

Related papers

SHIELD: Multi-task Multi-distribution Vehicle Routing Solver with Sparsity and Hierarchy [26.708590440636527]
We introduce SHIELD, a novel model that leverages both sparsity and hierarchy principles.<n>We develop a context-based clustering layer that exploits the presence of hierarchical structures in the problems to produce better local representations.<n>Our results demonstrate the superiority of our approach over existing methods on 9 real-world maps with 16 VRP variants each.
arXiv Detail & Related papers (2025-06-10T03:55:14Z)
Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation [158.37640586809187]
Restoring any degraded image efficiently via just one model has become increasingly significant. Our approach, termed AnyIR, takes a unified path that leverages inherent similarity across various degradations. To fuse the degradation awareness and the contextualized attention, a spatial-frequency parallel fusion strategy is proposed.
arXiv Detail & Related papers (2025-04-19T09:54:46Z)
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining [74.83412846804977]
Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models. We present a systematic end-to-end study of RL fine-tuning for mathematical reasoning by training models entirely from scratch.
arXiv Detail & Related papers (2025-04-10T17:15:53Z)
TuneNSearch: a hybrid transfer learning and local search approach for solving vehicle routing problems [43.89334324926175]
TuneNSearch is a hybrid transfer learning and local search approach for addressing different variants of vehicle routing problems (VRP) We first pre-train a reinforcement learning model on the multi-depot VRP, followed by a short fine-tuning phase to adapt it to different variants. Results show that TuneNSearch outperforms many existing state-of-the-art models trained for each VRP variant, requiring only one-fifth of the training epochs.
arXiv Detail & Related papers (2025-03-16T21:34:11Z)
DiffSG: A Generative Solver for Network Optimization with Diffusion Model [75.27274046562806]
Diffusion generative models can consider a broader range of solutions and exhibit stronger generalization by learning parameters. We propose a new framework, which leverages intrinsic distribution learning of diffusion generative models to learn high-quality solutions.
arXiv Detail & Related papers (2024-08-13T07:56:21Z)
Restore Anything Model via Efficient Degradation Adaptation [129.38475243424563]
RAM takes a unified path that leverages inherent similarities across various degradations to enable efficient and comprehensive restoration. RAM's SOTA performance confirms RAM's SOTA performance, reducing model complexity by approximately 82% in trainable parameters and 85% in FLOPs.
arXiv Detail & Related papers (2024-07-18T10:26:53Z)
Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data. We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z)
Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding [90.77521413857448]
Deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations. We introduce Generalized generative adversarial-Decoding Diffusion Probabilistic Models (EDDPMs) EDDPMs generalize the Gaussian noising-denoising in standard diffusion by introducing parameterized encoding-decoding. Experiments on text, proteins, and images demonstrate the flexibility to handle diverse data and tasks.
arXiv Detail & Related papers (2024-02-29T10:08:57Z)
Multi-Task Learning for Routing Problem with Cross-Problem Zero-Shot Generalization [18.298695520665348]
Vehicle routing problems (VRPs) can be found in numerous real-world applications. In this work, we make the first attempt to tackle the crucial challenge of cross-problem generalization. Our proposed model can successfully solve VRPs with unseen attribute combinations in a zero-shot generalization manner.
arXiv Detail & Related papers (2024-02-23T13:25:23Z)
SimSCOOD: Systematic Analysis of Out-of-Distribution Generalization in Fine-tuned Source Code Models [58.78043959556283]
We study the behaviors of models under different fine-tuning methodologies, including full fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning methods. Our analysis uncovers that LoRA fine-tuning consistently exhibits significantly better OOD generalization performance than full fine-tuning across various scenarios.
arXiv Detail & Related papers (2022-10-10T16:07:24Z)
A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning. We propose a novel estimation function with decomposable structural properties for optimization-based exploration. Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z)
Attention, Filling in The Gaps for Generalization in Routing Problems [5.210197476419621]
This paper aims at encouraging the consolidation of the field through understanding and improving current existing models. We first target model discrepancies by adapting the Kool et al. method and its loss function for Sparse Dynamic Attention. We then target inherent differences through the use of a mixed instance training method that has been shown to outperform single instance training in certain scenarios.
arXiv Detail & Related papers (2022-07-14T21:36:51Z)
Bottlenecks CLUB: Unifying Information-Theoretic Trade-offs Among Complexity, Leakage, and Utility [8.782250973555026]
Bottleneck problems are an important class of optimization problems that have recently gained increasing attention in the domain of machine learning and information theory. We propose a general family of optimization problems, termed as complexity-leakage-utility bottleneck (CLUB) model. We show that the CLUB model generalizes all these problems as well as most other information-theoretic privacy models.
arXiv Detail & Related papers (2022-07-11T14:07:48Z)
On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules. We study the generalization and adaption performance of such modular neural causal models. Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z)
Learning to Solve Routing Problems via Distributionally Robust Optimization [14.506553345693536]
Recent deep models for solving routing problems assume a single distribution of nodes for training, which severely impairs their cross-distribution generalization ability. We exploit group distributionally robust optimization (group DRO) to tackle this issue, where we jointly optimize the weights for different groups of distributions and the parameters for the deep model in an interleaved manner during training. We also design a module based on convolutional neural network, which allows the deep model to learn more informative latent pattern among the nodes.
arXiv Detail & Related papers (2022-02-15T08:06:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.