Efficiently Robustify Pre-trained Models
- URL: http://arxiv.org/abs/2309.07499v1
- Date: Thu, 14 Sep 2023 08:07:49 GMT
- Title: Efficiently Robustify Pre-trained Models
- Authors: Nishant Jain, Harkirat Behl, Yogesh Singh Rawat, Vibhav Vineet
- Abstract summary: robustness of large scale models towards real-world settings is still a less-explored topic.
We first benchmark the performance of these models under different perturbations and datasets.
We then discuss on how complete model fine-tuning based existing robustification schemes might not be a scalable option given very large scale networks.
- Score: 18.392732966487582
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A recent trend in deep learning algorithms has been towards training large
scale models, having high parameter count and trained on big dataset. However,
robustness of such large scale models towards real-world settings is still a
less-explored topic. In this work, we first benchmark the performance of these
models under different perturbations and datasets thereby representing
real-world shifts, and highlight their degrading performance under these
shifts. We then discuss on how complete model fine-tuning based existing
robustification schemes might not be a scalable option given very large scale
networks and can also lead them to forget some of the desired characterstics.
Finally, we propose a simple and cost-effective method to solve this problem,
inspired by knowledge transfer literature. It involves robustifying smaller
models, at a lower computation cost, and then use them as teachers to tune a
fraction of these large scale networks, reducing the overall computational
overhead. We evaluate our proposed method under various vision perturbations
including ImageNet-C,R,S,A datasets and also for transfer learning, zero-shot
evaluation setups on different datasets. Benchmark results show that our method
is able to induce robustness to these large scale models efficiently, requiring
significantly lower time and also preserves the transfer learning, zero-shot
properties of the original model which none of the existing methods are able to
achieve.
Related papers
- Scaling Laws for Pre-training Agents and World Models [22.701210075508147]
Performance of embodied agents has been shown to improve by increasing model parameters, dataset size, and compute.
This paper characterizes the role of scale in these tasks more precisely.
arXiv Detail & Related papers (2024-11-07T04:57:40Z) - Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Optimizing Dense Feed-Forward Neural Networks [0.0]
We propose a novel feed-forward neural network constructing method based on pruning and transfer learning.
Our approach can compress the number of parameters by more than 70%.
We also evaluate the transfer learning level comparing the refined model and the original one training from scratch a neural network.
arXiv Detail & Related papers (2023-12-16T23:23:16Z) - Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding [9.112203072394648]
Power-law scaling indicates that large-scale training with uniform sampling is prohibitively slow.
Active learning methods aim to increase data efficiency by prioritizing learning on the most relevant examples.
arXiv Detail & Related papers (2023-12-08T19:26:13Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Complementary Ensemble Learning [1.90365714903665]
We derive a technique to improve performance of state-of-the-art deep learning models.
Specifically, we train auxiliary models which are able to complement state-of-the-art model uncertainty.
arXiv Detail & Related papers (2021-11-09T03:23:05Z) - Top-KAST: Top-K Always Sparse Training [50.05611544535801]
We propose Top-KAST, a method that preserves constant sparsity throughout training.
We show that it performs comparably to or better than previous works when training models on the established ImageNet benchmark.
In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling.
arXiv Detail & Related papers (2021-06-07T11:13:05Z) - Mixed-Privacy Forgetting in Deep Networks [114.3840147070712]
We show that the influence of a subset of the training samples can be removed from the weights of a network trained on large-scale image classification tasks.
Inspired by real-world applications of forgetting techniques, we introduce a novel notion of forgetting in mixed-privacy setting.
We show that our method allows forgetting without having to trade off the model accuracy.
arXiv Detail & Related papers (2020-12-24T19:34:56Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.