SADT: Combining Sharpness-Aware Minimization with Self-Distillation for
Improved Model Generalization
- URL: http://arxiv.org/abs/2211.00310v1
- Date: Tue, 1 Nov 2022 07:30:53 GMT
- Title: SADT: Combining Sharpness-Aware Minimization with Self-Distillation for
Improved Model Generalization
- Authors: Masud An-Nur Islam Fahim, Jani Boutellier
- Abstract summary: Methods for improving deep neural network training times and model generalizability consist of various data augmentation, regularization, and optimization approaches.
This work jointly considers two recent training strategies that address model generalizability: sharpness-aware, minimization, and self-distillation.
The experimental section of this work shows that SADT consistently outperforms previously published training strategies in model convergence time, test-time performance, and model generalizability.
- Score: 4.365720395124051
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Methods for improving deep neural network training times and model
generalizability consist of various data augmentation, regularization, and
optimization approaches, which tend to be sensitive to hyperparameter settings
and make reproducibility more challenging. This work jointly considers two
recent training strategies that address model generalizability: sharpness-aware
minimization, and self-distillation, and proposes the novel training strategy
of Sharpness-Aware Distilled Teachers (SADT). The experimental section of this
work shows that SADT consistently outperforms previously published training
strategies in model convergence time, test-time performance, and model
generalizability over various neural architectures, datasets, and
hyperparameter settings.
Related papers
- SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identification [3.0398616939692777]
Techniques like adversarial learning, contrastive learning, diffusion denoising learning, and ordinary reconstruction learning have become standard.
The study aims to elucidate the advantages of pre-training techniques and fine-tuning strategies to enhance the learning process of neural networks.
arXiv Detail & Related papers (2024-05-29T15:44:51Z) - EsaCL: Efficient Continual Learning of Sparse Models [10.227171407348326]
Key challenge in the continual learning setting is to efficiently learn a sequence of tasks without forgetting how to perform previously learned tasks.
We propose a new method for efficient continual learning of sparse models (EsaCL) that can automatically prune redundant parameters without adversely impacting the model's predictive power.
arXiv Detail & Related papers (2024-01-11T04:59:44Z) - Not All Steps are Equal: Efficient Generation with Progressive Diffusion
Models [62.155612146799314]
We propose a novel two-stage training strategy termed Step-Adaptive Training.
In the initial stage, a base denoising model is trained to encompass all timesteps.
We partition the timesteps into distinct groups, fine-tuning the model within each group to achieve specialized denoising capabilities.
arXiv Detail & Related papers (2023-12-20T03:32:58Z) - MAST: Model-Agnostic Sparsified Training [4.962431253126472]
We introduce a novel optimization problem formulation that departs from the conventional way of minimizing machine learning model loss as a black-box function.
Unlike traditional formulations, the proposed approach explicitly incorporates an initially pre-trained model and random sketch operators.
We present several variants of the Gradient Descent (SGD) method adapted to the new problem formulation.
arXiv Detail & Related papers (2023-11-27T18:56:03Z) - Towards More Robust and Accurate Sequential Recommendation with
Cascade-guided Adversarial Training [54.56998723843911]
Two properties unique to the nature of sequential recommendation models may impair their robustness.
We propose Cascade-guided Adversarial training, a new adversarial training procedure that is specifically designed for sequential recommendation models.
arXiv Detail & Related papers (2023-04-11T20:55:02Z) - Homotopy-based training of NeuralODEs for accurate dynamics discovery [0.0]
We develop a new training method for NeuralODEs, based on synchronization and homotopy optimization.
We show that synchronizing the model dynamics and the training data tames the originally irregular loss landscape.
Our method achieves competitive or better training loss while often requiring less than half the number of training epochs.
arXiv Detail & Related papers (2022-10-04T06:32:45Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Improved Adversarial Training via Learned Optimizer [101.38877975769198]
We propose a framework to improve the robustness of adversarial training models.
By co-training's parameters model's weights, the proposed framework consistently improves robustness and steps adaptively for update directions.
arXiv Detail & Related papers (2020-04-25T20:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.