Learning Generalizable Models for Vehicle Routing Problems via Knowledge
Distillation
- URL: http://arxiv.org/abs/2210.07686v1
- Date: Fri, 14 Oct 2022 10:23:23 GMT
- Title: Learning Generalizable Models for Vehicle Routing Problems via Knowledge
Distillation
- Authors: Jieyi Bi, Yining Ma, Jiahai Wang, Zhiguang Cao, Jinbiao Chen, Yuan
Sun, Yeow Meng Chee
- Abstract summary: Recent neural methods for vehicle routing problems always train and test the deep models on the same instance distribution.
We propose an Adaptive Multi-Distribution Knowledge Distillation scheme for learning more generalizable deep models.
Our AMDKD is generic, and consumes less computational resources for inference.
- Score: 23.483671660119384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent neural methods for vehicle routing problems always train and test the
deep models on the same instance distribution (i.e., uniform). To tackle the
consequent cross-distribution generalization concerns, we bring the knowledge
distillation to this field and propose an Adaptive Multi-Distribution Knowledge
Distillation (AMDKD) scheme for learning more generalizable deep models.
Particularly, our AMDKD leverages various knowledge from multiple teachers
trained on exemplar distributions to yield a light-weight yet generalist
student model. Meanwhile, we equip AMDKD with an adaptive strategy that allows
the student to concentrate on difficult distributions, so as to absorb
hard-to-master knowledge more effectively. Extensive experimental results show
that, compared with the baseline neural methods, our AMDKD is able to achieve
competitive results on both unseen in-distribution and out-of-distribution
instances, which are either randomly synthesized or adopted from benchmark
datasets (i.e., TSPLIB and CVRPLIB). Notably, our AMDKD is generic, and
consumes less computational resources for inference.
Related papers
- Adv-KD: Adversarial Knowledge Distillation for Faster Diffusion Sampling [2.91204440475204]
Diffusion Probabilistic Models (DPMs) have emerged as a powerful class of deep generative models.
They rely on sequential denoising steps during sample generation.
We propose a novel method that integrates denoising phases directly into the model's architecture.
arXiv Detail & Related papers (2024-05-31T08:19:44Z) - DAFT: Distilling Adversarially Fine-tuned Models for Better OOD
Generalization [35.53270942633211]
We consider the problem of OOD generalization, where the goal is to train a model that performs well on test distributions that are different from the training distribution.
We propose a new method - DAFT - based on the intuition that adversarially robust combination of a large number of rich features should provide OOD robustness.
We evaluate DAFT on standard benchmarks in the DomainBed framework, and demonstrate that DAFT achieves significant improvements over the current state-of-the-art OOD generalization methods.
arXiv Detail & Related papers (2022-08-19T03:48:17Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Learning to Solve Routing Problems via Distributionally Robust
Optimization [14.506553345693536]
Recent deep models for solving routing problems assume a single distribution of nodes for training, which severely impairs their cross-distribution generalization ability.
We exploit group distributionally robust optimization (group DRO) to tackle this issue, where we jointly optimize the weights for different groups of distributions and the parameters for the deep model in an interleaved manner during training.
We also design a module based on convolutional neural network, which allows the deep model to learn more informative latent pattern among the nodes.
arXiv Detail & Related papers (2022-02-15T08:06:44Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Deep Stable Learning for Out-Of-Distribution Generalization [27.437046504902938]
Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution.
Eliminating the impact of distribution shifts between training and testing data is crucial for building performance-promising deep models.
We propose to address this problem by removing the dependencies between features via learning weights for training samples.
arXiv Detail & Related papers (2021-04-16T03:54:21Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z) - Residual Knowledge Distillation [96.18815134719975]
This work proposes Residual Knowledge Distillation (RKD), which further distills the knowledge by introducing an assistant (A)
In this way, S is trained to mimic the feature maps of T, and A aids this process by learning the residual error between them.
Experiments show that our approach achieves appealing results on popular classification datasets, CIFAR-100 and ImageNet.
arXiv Detail & Related papers (2020-02-21T07:49:26Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z) - Learning From Multiple Experts: Self-paced Knowledge Distillation for
Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME)
We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model.
We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.