On-Device Domain Generalization
- URL: http://arxiv.org/abs/2209.07521v1
- Date: Thu, 15 Sep 2022 17:59:31 GMT
- Title: On-Device Domain Generalization
- Authors: Kaiyang Zhou, Yuanhan Zhang, Yuhang Zang, Jingkang Yang, Chen Change
Loy, Ziwei Liu
- Abstract summary: Domain generalization is critical to on-device machine learning applications.
We find that knowledge distillation is a strong candidate for solving the problem.
We propose a simple idea called out-of-distribution knowledge distillation (OKD), which aims to teach the student how the teacher handles (synthetic) out-of-distribution data.
- Score: 93.79736882489982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a systematic study of domain generalization (DG) for tiny neural
networks, a problem that is critical to on-device machine learning applications
but has been overlooked in the literature where research has been focused on
large models only. Tiny neural networks have much fewer parameters and lower
complexity, and thus should not be trained the same way as their large
counterparts for DG applications. We find that knowledge distillation is a
strong candidate for solving the problem: it outperforms state-of-the-art DG
methods that were developed using large models with a large margin. Moreover,
we observe that the teacher-student performance gap on test data with domain
shift is bigger than that on in-distribution data. To improve DG for tiny
neural networks without increasing the deployment cost, we propose a simple
idea called out-of-distribution knowledge distillation (OKD), which aims to
teach the student how the teacher handles (synthetic) out-of-distribution data
and is proved to be a promising framework for solving the problem. We also
contribute a scalable method of creating DG datasets, called DOmain Shift in
COntext (DOSCO), which can be applied to broad data at scale without much human
effort. Code and models are released at
\url{https://github.com/KaiyangZhou/on-device-dg}.
Related papers
- Solving Large-scale Spatial Problems with Convolutional Neural Networks [88.31876586547848]
We employ transfer learning to improve training efficiency for large-scale spatial problems.
We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation.
arXiv Detail & Related papers (2023-06-14T01:24:42Z) - CNN Feature Map Augmentation for Single-Source Domain Generalization [6.053629733936548]
Domain Generalization (DG) has gained significant traction during the past few years.
The goal in DG is to produce models which continue to perform well when presented with data distributions different from the ones available during training.
We propose an alternative regularization technique for convolutional neural network architectures in the single-source DG image classification setting.
arXiv Detail & Related papers (2023-05-26T08:48:17Z) - When Neural Networks Fail to Generalize? A Model Sensitivity Perspective [82.36758565781153]
Domain generalization (DG) aims to train a model to perform well in unseen domains under different distributions.
This paper considers a more realistic yet more challenging scenario, namely Single Domain Generalization (Single-DG)
We empirically ascertain a property of a model that correlates strongly with its generalization that we coin as "model sensitivity"
We propose a novel strategy of Spectral Adversarial Data Augmentation (SADA) to generate augmented images targeted at the highly sensitive frequencies.
arXiv Detail & Related papers (2022-12-01T20:15:15Z) - Online Cross-Layer Knowledge Distillation on Graph Neural Networks with
Deep Supervision [6.8080936803807734]
Graph neural networks (GNNs) have become one of the most popular research topics in both academia and industry.
Large-scale datasets are posing great challenges for deploying GNNs in edge devices with limited resources.
We propose a novel online knowledge distillation framework called Alignahead++ in this paper.
arXiv Detail & Related papers (2022-10-25T03:21:20Z) - Data-Free Adversarial Knowledge Distillation for Graph Neural Networks [62.71646916191515]
We propose the first end-to-end framework for data-free adversarial knowledge distillation on graph structured data (DFAD-GNN)
To be specific, our DFAD-GNN employs a generative adversarial network, which mainly consists of three components: a pre-trained teacher model and a student model are regarded as two discriminators, and a generator is utilized for deriving training graphs to distill knowledge from the teacher model into the student model.
Our DFAD-GNN significantly surpasses state-of-the-art data-free baselines in the graph classification task.
arXiv Detail & Related papers (2022-05-08T08:19:40Z) - Network Gradient Descent Algorithm for Decentralized Federated Learning [0.2867517731896504]
We study a fully decentralized federated learning algorithm, which is a novel descent gradient algorithm executed on a communication-based network.
In the NGD method, only statistics (e.g., parameter estimates) need to be communicated, minimizing the risk of privacy.
We find that both the learning rate and the network structure play significant roles in determining the NGD estimator's statistical efficiency.
arXiv Detail & Related papers (2022-05-06T02:53:31Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - Self-Ensembling GAN for Cross-Domain Semantic Segmentation [107.27377745720243]
This paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation.
In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN.
Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model.
arXiv Detail & Related papers (2021-12-15T09:50:25Z) - Domain-Irrelevant Representation Learning for Unsupervised Domain
Generalization [22.980607134596077]
Domain generalization (DG) aims to help models trained on a set of source domains generalize better on unseen target domains.
While unlabeled data are far more accessible, we seek to explore how unsupervised learning can help deep models generalizes across domains.
We propose a Domain-Irrelevant Unsupervised Learning (DIUL) method to cope with the significant and misleading heterogeneity within unlabeled data.
arXiv Detail & Related papers (2021-07-13T16:20:50Z) - Wide Network Learning with Differential Privacy [7.453881927237143]
Current generation of neural networks suffers significant loss accuracy under most practically relevant privacy training regimes.
We develop a general approach towards training these models that takes advantage of the sparsity of the gradients of private Empirical Minimization (ERM)
Following the same number of parameters, we propose a novel algorithm for privately training neural networks.
arXiv Detail & Related papers (2021-03-01T20:31:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.