Revisiting Data Augmentation in Model Compression: An Empirical and
Comprehensive Study
- URL: http://arxiv.org/abs/2305.13232v1
- Date: Mon, 22 May 2023 17:05:06 GMT
- Title: Revisiting Data Augmentation in Model Compression: An Empirical and
Comprehensive Study
- Authors: Muzhou Yu, Linfeng Zhang and Kaisheng Ma
- Abstract summary: In this paper, we revisit the usage of data augmentation in model compression.
We show that models in different sizes prefer data augmentation with different magnitudes.
The prediction of a pre-trained large model can be utilized to measure the difficulty of data augmentation.
- Score: 17.970216875558638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The excellent performance of deep neural networks is usually accompanied by a
large number of parameters and computations, which have limited their usage on
the resource-limited edge devices. To address this issue, abundant methods such
as pruning, quantization and knowledge distillation have been proposed to
compress neural networks and achieved significant breakthroughs. However, most
of these compression methods focus on the architecture or the training method
of neural networks but ignore the influence from data augmentation. In this
paper, we revisit the usage of data augmentation in model compression and give
a comprehensive study on the relation between model sizes and their optimal
data augmentation policy. To sum up, we mainly have the following three
observations: (A) Models in different sizes prefer data augmentation with
different magnitudes. Hence, in iterative pruning, data augmentation with
varying magnitudes leads to better performance than data augmentation with a
consistent magnitude. (B) Data augmentation with a high magnitude may
significantly improve the performance of large models but harm the performance
of small models. Fortunately, small models can still benefit from strong data
augmentations by firstly learning them with "additional parameters" and then
discard these "additional parameters" during inference. (C) The prediction of a
pre-trained large model can be utilized to measure the difficulty of data
augmentation. Thus it can be utilized as a criterion to design better data
augmentation policies. We hope this paper may promote more research on the
usage of data augmentation in model compression.
Related papers
- How Does Data Diversity Shape the Weight Landscape of Neural Networks? [2.89287673224661]
We investigate the impact of dropout, weight decay, and noise augmentation on the parameter space of neural networks.
We observe that diverse data influences the weight landscape in a similar fashion as dropout.
We conclude that synthetic data can bring more diversity into real input data, resulting in a better performance on out-of-distribution test instances.
arXiv Detail & Related papers (2024-10-18T16:57:05Z) - A Comparative Study on Enhancing Prediction in Social Network Advertisement through Data Augmentation [0.6707149143800017]
This study presents and explores a generative augmentation framework of social network advertising data.
Our framework explores three generative models for data augmentation - Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Gaussian Mixture Models (GMMs)
arXiv Detail & Related papers (2024-04-22T01:16:11Z) - A Survey on Data Augmentation in Large Model Era [16.05117556207015]
Large models, encompassing large language and diffusion models, have shown exceptional promise in approximating human-level intelligence.
With continuous updates to these models, the existing reservoir of high-quality data may soon be depleted.
This paper offers an exhaustive review of large model-driven data augmentation methods.
arXiv Detail & Related papers (2024-01-27T14:19:33Z) - Optimizing Dense Feed-Forward Neural Networks [0.0]
We propose a novel feed-forward neural network constructing method based on pruning and transfer learning.
Our approach can compress the number of parameters by more than 70%.
We also evaluate the transfer learning level comparing the refined model and the original one training from scratch a neural network.
arXiv Detail & Related papers (2023-12-16T23:23:16Z) - DualAug: Exploiting Additional Heavy Augmentation with OOD Data
Rejection [77.6648187359111]
We propose a novel data augmentation method, named textbfDualAug, to keep the augmentation in distribution as much as possible at a reasonable time and computational cost.
Experiments on supervised image classification benchmarks show that DualAug improve various automated data augmentation method.
arXiv Detail & Related papers (2023-10-12T08:55:10Z) - Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase.
We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output.
Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - Exploring the Effects of Data Augmentation for Drivable Area
Segmentation [0.0]
We focus on investigating the benefits of data augmentation by analyzing pre-existing image datasets.
Our results show that the performance and robustness of existing state of the art (or SOTA) models can be increased dramatically.
arXiv Detail & Related papers (2022-08-06T03:39:37Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.