Efficiently Robustify Pre-trained Models
- URL: http://arxiv.org/abs/2309.07499v1
- Date: Thu, 14 Sep 2023 08:07:49 GMT
- Title: Efficiently Robustify Pre-trained Models
- Authors: Nishant Jain, Harkirat Behl, Yogesh Singh Rawat, Vibhav Vineet
- Abstract summary: robustness of large scale models towards real-world settings is still a less-explored topic.
We first benchmark the performance of these models under different perturbations and datasets.
We then discuss on how complete model fine-tuning based existing robustification schemes might not be a scalable option given very large scale networks.
- Score: 18.392732966487582
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A recent trend in deep learning algorithms has been towards training large
scale models, having high parameter count and trained on big dataset. However,
robustness of such large scale models towards real-world settings is still a
less-explored topic. In this work, we first benchmark the performance of these
models under different perturbations and datasets thereby representing
real-world shifts, and highlight their degrading performance under these
shifts. We then discuss on how complete model fine-tuning based existing
robustification schemes might not be a scalable option given very large scale
networks and can also lead them to forget some of the desired characterstics.
Finally, we propose a simple and cost-effective method to solve this problem,
inspired by knowledge transfer literature. It involves robustifying smaller
models, at a lower computation cost, and then use them as teachers to tune a
fraction of these large scale networks, reducing the overall computational
overhead. We evaluate our proposed method under various vision perturbations
including ImageNet-C,R,S,A datasets and also for transfer learning, zero-shot
evaluation setups on different datasets. Benchmark results show that our method
is able to induce robustness to these large scale models efficiently, requiring
significantly lower time and also preserves the transfer learning, zero-shot
properties of the original model which none of the existing methods are able to
achieve.
Related papers
- An In-Depth Analysis of Data Reduction Methods for Sustainable Deep Learning [0.15833270109954137]
We present up to eight different methods to reduce the size of a training dataset.
We also develop a Python package to apply them.
We experimentally compare how these data reduction methods affect the representativeness of the reduced dataset.
arXiv Detail & Related papers (2024-03-22T12:06:40Z) - Diffusion-based Neural Network Weights Generation [85.6725307453325]
We propose an efficient and adaptive transfer learning scheme through dataset-conditioned pretrained weights sampling.
Specifically, we use a latent diffusion model with a variational autoencoder that can reconstruct the neural network weights.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - A Systematic Approach to Robustness Modelling for Deep Convolutional
Neural Networks [0.294944680995069]
Recent work raises questions about the ability for even larger models to generalize to data outside of the controlled train and test sets.
We provide a method that uses induced failures to model the probability of failure as a function of time.
We examine the various trade-offs between cost, robustness, latency, and reliability to find that larger models do not significantly aid in adversarial robustness.
arXiv Detail & Related papers (2024-01-24T19:12:37Z) - Optimizing Dense Feed-Forward Neural Networks [0.0]
We propose a novel feed-forward neural network constructing method based on pruning and transfer learning.
Our approach can compress the number of parameters by more than 70%.
We also evaluate the transfer learning level comparing the refined model and the original one training from scratch a neural network.
arXiv Detail & Related papers (2023-12-16T23:23:16Z) - Bad Students Make Great Teachers: Active Learning Accelerates
Large-Scale Visual Understanding [9.655434542591815]
Power-law scaling indicates that large-scale training with uniform sampling is prohibitively slow.
Active learning methods aim to increase data efficiency by prioritizing learning on the most relevant examples.
arXiv Detail & Related papers (2023-12-08T19:26:13Z) - A Simple and Efficient Baseline for Data Attribution on Images [107.12337511216228]
Current state-of-the-art approaches require a large ensemble of as many as 300,000 models to accurately attribute model predictions.
In this work, we focus on a minimalist baseline, utilizing the feature space of a backbone pretrained via self-supervised learning to perform data attribution.
Our method is model-agnostic and scales easily to large datasets.
arXiv Detail & Related papers (2023-11-03T17:29:46Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Complementary Ensemble Learning [1.90365714903665]
We derive a technique to improve performance of state-of-the-art deep learning models.
Specifically, we train auxiliary models which are able to complement state-of-the-art model uncertainty.
arXiv Detail & Related papers (2021-11-09T03:23:05Z) - Top-KAST: Top-K Always Sparse Training [50.05611544535801]
We propose Top-KAST, a method that preserves constant sparsity throughout training.
We show that it performs comparably to or better than previous works when training models on the established ImageNet benchmark.
In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling.
arXiv Detail & Related papers (2021-06-07T11:13:05Z) - Mixed-Privacy Forgetting in Deep Networks [114.3840147070712]
We show that the influence of a subset of the training samples can be removed from the weights of a network trained on large-scale image classification tasks.
Inspired by real-world applications of forgetting techniques, we introduce a novel notion of forgetting in mixed-privacy setting.
We show that our method allows forgetting without having to trade off the model accuracy.
arXiv Detail & Related papers (2020-12-24T19:34:56Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.