Towards Mitigating Architecture Overfitting on Distilled Datasets
- URL: http://arxiv.org/abs/2309.04195v2
- Date: Tue, 07 Jan 2025 08:46:02 GMT
- Title: Towards Mitigating Architecture Overfitting on Distilled Datasets
- Authors: Xuyang Zhong, Chen Liu,
- Abstract summary: This paper introduces a series of approaches to mitigate the issue of textitarchitecture overfitting.<n>Specifically, DropPath renders the large model to be an implicit ensemble of its sub-networks, and knowledge distillation ensures each sub-network acts similarly to the small but well-performing teacher network.<n>Our approaches achieve comparable or even superior performance when the test network is larger than the training network.
- Score: 2.3371504588528635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataset distillation methods have demonstrated remarkable performance for neural networks trained with very limited training data. However, a significant challenge arises in the form of \textit{architecture overfitting}: the distilled training dataset synthesized by a specific network architecture (i.e., training network) generates poor performance when trained by other network architectures (i.e., test networks), especially when the test networks have a larger capacity than the training network. This paper introduces a series of approaches to mitigate this issue. Among them, DropPath renders the large model to be an implicit ensemble of its sub-networks, and knowledge distillation ensures each sub-network acts similarly to the small but well-performing teacher network. These methods, characterized by their smoothing effects, significantly mitigate architecture overfitting. We conduct extensive experiments to demonstrate the effectiveness and generality of our methods. Particularly, across various scenarios involving different tasks and different sizes of distilled data, our approaches significantly mitigate architecture overfitting. Furthermore, our approaches achieve comparable or even superior performance when the test network is larger than the training network.
Related papers
- Uncertainty estimation via ensembles of deep learning models and dropout layers for seismic traces [27.619194576741673]
We develop Convolutional Neural Networks (CNNs) to classify seismic waveforms based on first-motion polarity.
We constructed ensembles of networks to estimate uncertainty.
We observe that the uncertainty estimation ability of the ensembles of networks can be enhanced using dropout layers.
arXiv Detail & Related papers (2024-10-08T15:22:15Z) - Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs [48.406728896785296]
We propose a novel approach to automatically construct a unified label space across multiple datasets using graph neural networks.
Unlike existing methods, our approach facilitates seamless training without the need for additional manual reannotation or taxonomy reconciliation.
arXiv Detail & Related papers (2024-07-15T08:42:10Z) - Efficient and Accurate Hyperspectral Image Demosaicing with Neural Network Architectures [3.386560551295746]
This study investigates the effectiveness of neural network architectures in hyperspectral image demosaicing.
We introduce a range of network models and modifications, and compare them with classical methods and existing reference network approaches.
Results indicate that our networks outperform or match reference models in both datasets demonstrating exceptional performance.
arXiv Detail & Related papers (2023-12-21T08:02:49Z) - The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold [21.431022906309334]
We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training.
Networks with different architectures follow distinguishable trajectories but other factors have a minimal influence.
Larger networks train along a similar manifold as that of smaller networks, just faster; and networks at very different parts of the prediction space converge to the solution along a similar manifold.
arXiv Detail & Related papers (2023-05-02T17:09:07Z) - FedHeN: Federated Learning in Heterogeneous Networks [52.29110497518558]
We propose a novel training recipe for federated learning with heterogeneous networks.
We introduce training with a side objective to the devices of higher complexities to jointly train different architectures in a federated setting.
arXiv Detail & Related papers (2022-07-07T01:08:35Z) - Dataset Distillation by Matching Training Trajectories [75.9031209877651]
We propose a new formulation that optimize our distilled data to guide networks to a similar state as those trained on real data.
Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data.
Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
arXiv Detail & Related papers (2022-03-22T17:58:59Z) - Learning to Generate Synthetic Training Data using Gradient Matching and
Implicit Differentiation [77.34726150561087]
This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks.
Inspired by recent ideas, we suggest new data distillation techniques based on generative teaching networks, gradient matching, and the Implicit Function Theorem.
arXiv Detail & Related papers (2022-03-16T11:45:32Z) - DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale
Click-Through Rate Prediction [20.51885543358098]
We propose DHEN - a deep and hierarchical ensemble architecture that can leverage strengths of heterogeneous interaction modules and learn a hierarchy of the interactions under different orders.
Experiments on large-scale dataset from CTR prediction tasks attained 0.27% improvement on the Normalized Entropy of prediction and 1.2x better training throughput than state-of-the-art baseline.
arXiv Detail & Related papers (2022-03-11T21:19:31Z) - Towards Federated Bayesian Network Structure Learning with Continuous
Optimization [14.779035801521717]
We present a cross-silo federated learning approach to estimate the structure of Bayesian network.
We develop a distributed structure learning method based on continuous optimization.
arXiv Detail & Related papers (2021-10-18T14:36:05Z) - PDFNet: Pointwise Dense Flow Network for Urban-Scene Segmentation [0.0]
We propose a novel lightweight architecture named point-wise dense flow network (PDFNet)
In PDFNet, we employ dense, residual, and multiple shortcut connections to allow a smooth gradient flow to all parts of the network.
Our method significantly outperforms baselines in capturing small classes and in few-data regimes.
arXiv Detail & Related papers (2021-09-21T10:39:46Z) - Unsupervised Domain-adaptive Hash for Networks [81.49184987430333]
Domain-adaptive hash learning has enjoyed considerable success in the computer vision community.
We develop an unsupervised domain-adaptive hash learning method for networks, dubbed UDAH.
arXiv Detail & Related papers (2021-08-20T12:09:38Z) - Joint Learning of Neural Transfer and Architecture Adaptation for Image
Recognition [77.95361323613147]
Current state-of-the-art visual recognition systems rely on pretraining a neural network on a large-scale dataset and finetuning the network weights on a smaller dataset.
In this work, we prove that dynamically adapting network architectures tailored for each domain task along with weight finetuning benefits in both efficiency and effectiveness.
Our method can be easily generalized to an unsupervised paradigm by replacing supernet training with self-supervised learning in the source domain tasks and performing linear evaluation in the downstream tasks.
arXiv Detail & Related papers (2021-03-31T08:15:17Z) - Task-Adaptive Neural Network Retrieval with Meta-Contrastive Learning [34.27089256930098]
We propose a novel neural network retrieval method, which retrieves the most optimal pre-trained network for a given task.
We train this framework by meta-learning a cross-modal latent space with contrastive loss, to maximize the similarity between a dataset and a network.
We validate the efficacy of our method on ten real-world datasets, against existing NAS baselines.
arXiv Detail & Related papers (2021-03-02T06:30:51Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z) - Mixed-Privacy Forgetting in Deep Networks [114.3840147070712]
We show that the influence of a subset of the training samples can be removed from the weights of a network trained on large-scale image classification tasks.
Inspired by real-world applications of forgetting techniques, we introduce a novel notion of forgetting in mixed-privacy setting.
We show that our method allows forgetting without having to trade off the model accuracy.
arXiv Detail & Related papers (2020-12-24T19:34:56Z) - Neural networks adapting to datasets: learning network size and topology [77.34726150561087]
We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a gradient-based training.
The resulting network has the structure of a graph tailored to the particular learning task and dataset.
arXiv Detail & Related papers (2020-06-22T12:46:44Z) - Dataset Condensation with Gradient Matching [36.14340188365505]
We propose a training set synthesis technique for data-efficient learning, called dataset Condensation, that learns to condense large dataset into a small set of informative synthetic samples for training deep neural networks from scratch.
We rigorously evaluate its performance in several computer vision benchmarks and demonstrate that it significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2020-06-10T16:30:52Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.