Related papers: LRM-1B: Towards Large Routing Model

LRM-1B: Towards Large Routing Model

URL: http://arxiv.org/abs/2507.03300v1
Date: Fri, 04 Jul 2025 05:10:20 GMT
Title: LRM-1B: Towards Large Routing Model
Authors: Han Li, Fei Liu, Zhenkun Wang, Qingfu Zhang,
Abstract summary: Vehicle routing problems (VRPs) are central to optimization with significant practical implications.<n>Recent advancements in neural optimization (NCO) have demonstrated promising results by leveraging neural networks to solve VRPs.<n>This study introduces a Large Routing Model with 1 billion parameters (LRM-1B) designed to address diverse VRP scenarios.
Score: 26.18687224390521
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vehicle routing problems (VRPs) are central to combinatorial optimization with significant practical implications. Recent advancements in neural combinatorial optimization (NCO) have demonstrated promising results by leveraging neural networks to solve VRPs, yet the exploration of model scaling within this domain remains underexplored. Inspired by the success of model scaling in large language models (LLMs), this study introduces a Large Routing Model with 1 billion parameters (LRM-1B), designed to address diverse VRP scenarios. We present a comprehensive evaluation of LRM-1B across multiple problem variants, distributions, and sizes, establishing state-of-the-art results. Our findings reveal that LRM-1B not only adapts to different VRP challenges but also showcases superior performance, outperforming existing models. Additionally, we explore the scaling behavior of neural routing models from 1M to 1B parameters. Our analysis confirms power-law between multiple model factors and performance, offering critical insights into the optimal configurations for foundation neural routing solvers.

Related papers

Are Large Brainwave Foundation Models Capable Yet? Insights from Fine-tuning [41.40603531008809]
We evaluate current Large Brainwave Foundation Models (LBMs) through systematic fine-tuning experiments.<n>Our analysis shows that state-of-the-art LBMs achieve only marginal improvements (0.9%-1.2%) over traditional deep architectures.
arXiv Detail & Related papers (2025-07-01T21:21:42Z)
Diffusion Models as Network Optimizers: Explorations and Analysis [71.69869025878856]
generative diffusion models (GDMs) have emerged as a promising new approach to network optimization.<n>In this study, we first explore the intrinsic characteristics of generative models.<n>We provide a concise theoretical and intuitive demonstration of the advantages of generative models over discriminative network optimization.
arXiv Detail & Related papers (2024-11-01T09:05:47Z)
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost. We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion. By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z)
Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture [9.244633039170186]
We propose a plug-and-play Entropy-based Scaling Factor (ESF) and a Distribution-Specific (DS) decoder.<n>ESF adjusts the attention weight pattern of the model towards familiar ones discovered during training when solving VRPs of varying sizes.<n>DS decoder explicitly models VRPs of multiple training distribution patterns through multiple auxiliary light decoders, expanding the model representation space.
arXiv Detail & Related papers (2024-06-10T09:03:17Z)
MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts [26.790392171537754]
We propose a multi-task vehicle routing solver with mixture-of-experts (MVMoE) We develop a hierarchical gating mechanism for the MVMoE, delivering a good trade-off between empirical performance and computational complexity. Experimentally, our method significantly promotes zero-shot generalization performance on 10 unseen VRP variants.
arXiv Detail & Related papers (2024-05-02T06:02:07Z)
Multi-Task Learning for Routing Problem with Cross-Problem Zero-Shot Generalization [18.298695520665348]
Vehicle routing problems (VRPs) can be found in numerous real-world applications. In this work, we make the first attempt to tackle the crucial challenge of cross-problem generalization. Our proposed model can successfully solve VRPs with unseen attribute combinations in a zero-shot generalization manner.
arXiv Detail & Related papers (2024-02-23T13:25:23Z)
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity. Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z)
Differential Evolution Algorithm based Hyper-Parameters Selection of Transformer Neural Network Model for Load Forecasting [0.0]
Transformer models have the potential to improve Load forecasting because of their ability to learn long-range dependencies derived from their Attention Mechanism. Our work compares the proposed Transformer based Neural Network model integrated with different metaheuristic algorithms by their performance in Load forecasting based on numerical metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE)
arXiv Detail & Related papers (2023-07-28T04:29:53Z)
Batch-Ensemble Stochastic Neural Networks for Out-of-Distribution Detection [55.028065567756066]
Out-of-distribution (OOD) detection has recently received much attention from the machine learning community due to its importance in deploying machine learning models in real-world applications. In this paper we propose an uncertainty quantification approach by modelling the distribution of features. We incorporate an efficient ensemble mechanism, namely batch-ensemble, to construct the batch-ensemble neural networks (BE-SNNs) and overcome the feature collapse problem. We show that BE-SNNs yield superior performance on several OOD benchmarks, such as the Two-Moons dataset, the FashionMNIST vs MNIST dataset, FashionM
arXiv Detail & Related papers (2022-06-26T16:00:22Z)
Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives. Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models. As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.