KL Regularized Normalization Framework for Low Resource Tasks
- URL: http://arxiv.org/abs/2212.11275v1
- Date: Wed, 21 Dec 2022 05:59:25 GMT
- Title: KL Regularized Normalization Framework for Low Resource Tasks
- Authors: Neeraj Kumar, Ankur Narang and Brejesh Lall
- Abstract summary: It is difficult to obtain a large quantity of supervised data due to the limited availability of resources and time.
We propose KullbackLeibler(KL) Regularized normalization (KL-Norm) which make the normalized data well behaved and helps in better generalization.
- Score: 18.88247001843119
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large pre-trained models, such as Bert, GPT, and Wav2Vec, have demonstrated
great potential for learning representations that are transferable to a wide
variety of downstream tasks . It is difficult to obtain a large quantity of
supervised data due to the limited availability of resources and time. In light
of this, a significant amount of research has been conducted in the area of
adopting large pre-trained datasets for diverse downstream tasks via fine
tuning, linear probing, or prompt tuning in low resource settings.
Normalization techniques are essential for accelerating training and improving
the generalization of deep neural networks and have been successfully used in a
wide variety of applications. A lot of normalization techniques have been
proposed but the success of normalization in low resource downstream NLP and
speech tasks is limited. One of the reasons is the inability to capture
expressiveness by rescaling parameters of normalization. We propose
KullbackLeibler(KL) Regularized normalization (KL-Norm) which make the
normalized data well behaved and helps in better generalization as it reduces
over-fitting, generalises well on out of domain distributions and removes
irrelevant biases and features with negligible increase in model parameters and
memory overheads. Detailed experimental evaluation on multiple low resource NLP
and speech tasks, demonstrates the superior performance of KL-Norm as compared
to other popular normalization and regularization techniques.
Related papers
- Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks.
To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z) - QT-DoG: Quantization-aware Training for Domain Generalization [58.439816306817306]
We propose Quantization-aware Training for Domain Generalization (QT-DoG)
QT-DoG exploits quantization as an implicit regularizer by inducing noise in model weights.
We demonstrate that QT-DoG generalizes across various datasets, architectures, and quantization algorithms.
arXiv Detail & Related papers (2024-10-08T13:21:48Z) - Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization [28.977757627384165]
Domain Domain (DG) aims to avoid the performance degradation of the model when the distribution shift between the limited training data and unseen test data occurs.
Recently, foundation models with enormous parameters have been pre-trained with huge datasets, demonstrating strong generalization ability.
Our framework achieves SOTA performance on five DG benchmarks, while only requiring training a small number of parameters without adding additional testing cost.
arXiv Detail & Related papers (2024-07-21T07:50:49Z) - Quantized Prompt for Efficient Generalization of Vision-Language Models [27.98205540768322]
Large-scale pre-trained vision-language models like CLIP have achieved tremendous success in various fields.
During downstream adaptation, the most challenging problems are overfitting and catastrophic forgetting.
In this paper, we explore quantization for regularizing vision-language model, which is quite efficiency and effective.
arXiv Detail & Related papers (2024-07-15T13:19:56Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - DR-Tune: Improving Fine-tuning of Pretrained Visual Models by
Distribution Regularization with Semantic Calibration [38.4461170690033]
We propose a novel fine-tuning framework, namely distribution regularization with semantic calibration (DR-Tune)
DR-Tune employs distribution regularization by enforcing the downstream task head to decrease its classification error on the pretrained feature distribution.
To alleviate the interference by semantic drift, we develop the semantic calibration (SC) module.
arXiv Detail & Related papers (2023-08-23T10:59:20Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Variational Information Bottleneck for Effective Low-Resource
Fine-Tuning [40.66716433803935]
We propose to use Variational Information Bottleneck (VIB) to suppress irrelevant features when fine-tuning on low-resource target tasks.
We show that our VIB model finds sentence representations that are more robust to biases in natural language inference datasets.
arXiv Detail & Related papers (2021-06-10T03:08:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.