Attention, Filling in The Gaps for Generalization in Routing Problems
- URL: http://arxiv.org/abs/2207.07212v1
- Date: Thu, 14 Jul 2022 21:36:51 GMT
- Title: Attention, Filling in The Gaps for Generalization in Routing Problems
- Authors: Ahmad Bdeir, Jonas K. Falkner, Lars Schmidt-Thieme
- Abstract summary: This paper aims at encouraging the consolidation of the field through understanding and improving current existing models.
We first target model discrepancies by adapting the Kool et al. method and its loss function for Sparse Dynamic Attention.
We then target inherent differences through the use of a mixed instance training method that has been shown to outperform single instance training in certain scenarios.
- Score: 5.210197476419621
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine Learning (ML) methods have become a useful tool for tackling vehicle
routing problems, either in combination with popular heuristics or as
standalone models. However, current methods suffer from poor generalization
when tackling problems of different sizes or different distributions. As a
result, ML in vehicle routing has witnessed an expansion phase with new
methodologies being created for particular problem instances that become
infeasible at larger problem sizes.
This paper aims at encouraging the consolidation of the field through
understanding and improving current existing models, namely the attention model
by Kool et al. We identify two discrepancy categories for VRP generalization.
The first is based on the differences that are inherent to the problems
themselves, and the second relates to architectural weaknesses that limit the
model's ability to generalize. Our contribution becomes threefold: We first
target model discrepancies by adapting the Kool et al. method and its loss
function for Sparse Dynamic Attention based on the alpha-entmax activation. We
then target inherent differences through the use of a mixed instance training
method that has been shown to outperform single instance training in certain
scenarios. Finally, we introduce a framework for inference level data
augmentation that improves performance by leveraging the model's lack of
invariance to rotation and dilation changes.
Related papers
- Diffusing States and Matching Scores: A New Framework for Imitation Learning [16.941612670582522]
Adversarial Imitation Learning is traditionally framed as a two-player zero-sum game between a learner and an adversarially chosen cost function.
In recent years, diffusion models have emerged as a non-adversarial alternative to GANs.
We show our approach outperforms GAN-style imitation learning baselines across various continuous control problems.
arXiv Detail & Related papers (2024-10-17T17:59:25Z) - Weight Scope Alignment: A Frustratingly Easy Method for Model Merging [40.080926444789085]
Non-I.I.D. data poses a huge challenge for averaging-based model fusion.
In this paper, we reveal variations in weight scope under different training conditions, shedding light on its influence on model merging.
Fortunately, the parameters in each layer basically follow the Gaussian distribution, which inspires a novel and simple regularization approach.
arXiv Detail & Related papers (2024-08-22T09:13:27Z) - Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture [9.244633039170186]
We propose a plug-and-play Entropy-based Scaling Factor (ESF) and a Distribution-Specific (DS) decoder.
ESF adjusts the attention weight pattern of the model towards familiar ones discovered during training when solving VRPs of varying sizes.
DS decoder explicitly models VRPs of multiple training distribution patterns through multiple auxiliary light decoders, expanding the model representation space.
arXiv Detail & Related papers (2024-06-10T09:03:17Z) - Prompt Learning for Generalized Vehicle Routing [17.424910810870273]
This work investigates an efficient prompt learning approach in Neural optimization for cross-distribution adaptation.
The proposed model learns a set of prompts among various distributions and then selects the best-matched one to prompt a pre-trained attention model for each problem instance.
It also outperforms existing generalized models on both in-distribution prediction and zero-shot generalization to a diverse set of new tasks.
arXiv Detail & Related papers (2024-05-20T15:42:23Z) - Promoting Generalization for Exact Solvers via Adversarial Instance
Augmentation [62.738582127114704]
Adar is a framework for understanding and improving the generalization of both imitation-learning-based (IL-based) and reinforcement-learning-based solvers (RL-based)
arXiv Detail & Related papers (2023-10-22T03:15:36Z) - Phasic Content Fusing Diffusion Model with Directional Distribution
Consistency for Few-Shot Model Adaption [73.98706049140098]
We propose a novel phasic content fusing few-shot diffusion model with directional distribution consistency loss.
Specifically, we design a phasic training strategy with phasic content fusion to help our model learn content and style information when t is large.
Finally, we propose a cross-domain structure guidance strategy that enhances structure consistency during domain adaptation.
arXiv Detail & Related papers (2023-09-07T14:14:11Z) - Towards Omni-generalizable Neural Methods for Vehicle Routing Problems [14.210085924625705]
This paper studies a challenging yet realistic setting, which considers generalization across both size and distribution in VRPs.
We propose a generic meta-learning framework, which enables effective training of an model with the capability of fast adaptation to new tasks during inference.
arXiv Detail & Related papers (2023-05-31T06:14:34Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Adaptive Fine-Grained Sketch-Based Image Retrieval [100.90633284767205]
Recent focus on Fine-Grained Sketch-Based Image Retrieval has shifted towards generalising a model to new categories.
In real-world applications, a trained FG-SBIR model is often applied to both new categories and different human sketchers.
We introduce a novel model-agnostic meta-learning (MAML) based framework with several key modifications.
arXiv Detail & Related papers (2022-07-04T21:07:20Z) - Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning [141.35105358670316]
We study the difference between a na"ively-trained initial-phase model and the oracle model.
We propose Class-wise Decorrelation (CwD) that effectively regularizes representations of each class to scatter more uniformly.
Our CwD is simple to implement and easy to plug into existing methods.
arXiv Detail & Related papers (2021-12-09T07:20:32Z) - Distributed Methods with Compressed Communication for Solving
Variational Inequalities, with Theoretical Guarantees [115.08148491584997]
We present the first theoretically grounded distributed methods for solving variational inequalities and saddle point problems using compressed communication: MASHA1 and MASHA2.
New algorithms support bidirectional compressions, and also can be modified for setting with batches and for federated learning with partial participation of clients.
arXiv Detail & Related papers (2021-10-07T10:04:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.