Towards Mitigating Architecture Overfitting in Dataset Distillation
- URL: http://arxiv.org/abs/2309.04195v1
- Date: Fri, 8 Sep 2023 08:12:29 GMT
- Title: Towards Mitigating Architecture Overfitting in Dataset Distillation
- Authors: Xuyang Zhong, Chen Liu
- Abstract summary: We propose a series of approaches in both architecture designs and training schemes to boost the generalization performance.
We conduct extensive experiments to demonstrate the effectiveness and generality of our methods.
- Score: 2.7610336610850292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataset distillation methods have demonstrated remarkable performance for
neural networks trained with very limited training data. However, a significant
challenge arises in the form of architecture overfitting: the distilled
training data synthesized by a specific network architecture (i.e., training
network) generates poor performance when trained by other network architectures
(i.e., test networks). This paper addresses this issue and proposes a series of
approaches in both architecture designs and training schemes which can be
adopted together to boost the generalization performance across different
network architectures on the distilled training data. We conduct extensive
experiments to demonstrate the effectiveness and generality of our methods.
Particularly, across various scenarios involving different sizes of distilled
data, our approaches achieve comparable or superior performance to existing
methods when training on the distilled data using networks with larger
capacities.
Related papers
- Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs [48.406728896785296]
We propose a novel approach to automatically construct a unified label space across multiple datasets using graph neural networks.
Unlike existing methods, our approach facilitates seamless training without the need for additional manual reannotation or taxonomy reconciliation.
arXiv Detail & Related papers (2024-07-15T08:42:10Z) - Efficient and Accurate Hyperspectral Image Demosaicing with Neural Network Architectures [3.386560551295746]
This study investigates the effectiveness of neural network architectures in hyperspectral image demosaicing.
We introduce a range of network models and modifications, and compare them with classical methods and existing reference network approaches.
Results indicate that our networks outperform or match reference models in both datasets demonstrating exceptional performance.
arXiv Detail & Related papers (2023-12-21T08:02:49Z) - The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold [21.431022906309334]
We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training.
Networks with different architectures follow distinguishable trajectories but other factors have a minimal influence.
Larger networks train along a similar manifold as that of smaller networks, just faster; and networks at very different parts of the prediction space converge to the solution along a similar manifold.
arXiv Detail & Related papers (2023-05-02T17:09:07Z) - FedHeN: Federated Learning in Heterogeneous Networks [52.29110497518558]
We propose a novel training recipe for federated learning with heterogeneous networks.
We introduce training with a side objective to the devices of higher complexities to jointly train different architectures in a federated setting.
arXiv Detail & Related papers (2022-07-07T01:08:35Z) - Dataset Distillation by Matching Training Trajectories [75.9031209877651]
We propose a new formulation that optimize our distilled data to guide networks to a similar state as those trained on real data.
Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data.
Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
arXiv Detail & Related papers (2022-03-22T17:58:59Z) - Learning to Generate Synthetic Training Data using Gradient Matching and
Implicit Differentiation [77.34726150561087]
This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks.
Inspired by recent ideas, we suggest new data distillation techniques based on generative teaching networks, gradient matching, and the Implicit Function Theorem.
arXiv Detail & Related papers (2022-03-16T11:45:32Z) - DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale
Click-Through Rate Prediction [20.51885543358098]
We propose DHEN - a deep and hierarchical ensemble architecture that can leverage strengths of heterogeneous interaction modules and learn a hierarchy of the interactions under different orders.
Experiments on large-scale dataset from CTR prediction tasks attained 0.27% improvement on the Normalized Entropy of prediction and 1.2x better training throughput than state-of-the-art baseline.
arXiv Detail & Related papers (2022-03-11T21:19:31Z) - Towards Federated Bayesian Network Structure Learning with Continuous
Optimization [14.779035801521717]
We present a cross-silo federated learning approach to estimate the structure of Bayesian network.
We develop a distributed structure learning method based on continuous optimization.
arXiv Detail & Related papers (2021-10-18T14:36:05Z) - Neural networks adapting to datasets: learning network size and topology [77.34726150561087]
We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a gradient-based training.
The resulting network has the structure of a graph tailored to the particular learning task and dataset.
arXiv Detail & Related papers (2020-06-22T12:46:44Z) - Dataset Condensation with Gradient Matching [36.14340188365505]
We propose a training set synthesis technique for data-efficient learning, called dataset Condensation, that learns to condense large dataset into a small set of informative synthetic samples for training deep neural networks from scratch.
We rigorously evaluate its performance in several computer vision benchmarks and demonstrate that it significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2020-06-10T16:30:52Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.