Quantified Task Misalignment to Inform PEFT: An Exploration of Domain
Generalization and Catastrophic Forgetting in CLIP
- URL: http://arxiv.org/abs/2402.09613v1
- Date: Wed, 14 Feb 2024 23:01:03 GMT
- Title: Quantified Task Misalignment to Inform PEFT: An Exploration of Domain
Generalization and Catastrophic Forgetting in CLIP
- Authors: Laura Niss, Kevin Vogt-Lowell, Theodoros Tsiligkaridis
- Abstract summary: We analyze the relation between task difficulty in the CLIP model and the performance of several simple parameter-efficient fine-tuning methods.
A method that trains only a subset of attention weights, which we call A-CLIP, yields a balance between domain generalization and catastrophic forgetting.
- Score: 7.550566004119157
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Foundations models are presented as generalists that often perform well over
a myriad of tasks. Fine-tuning these models, even on limited data, provides an
additional boost in task-specific performance but often at the cost of their
wider generalization, an effect termed catastrophic forgetting. In this paper,
we analyze the relation between task difficulty in the CLIP model and the
performance of several simple parameter-efficient fine-tuning methods through
the lens of domain generalization and catastrophic forgetting. We provide
evidence that the silhouette score of the zero-shot image and text embeddings
is a better measure of task difficulty than the average cosine similarity of
correct image/label embeddings, and discuss observable relationships between
task difficulty, fine-tuning method, domain generalization, and catastrophic
forgetting. Additionally, the averaged results across tasks and performance
measures demonstrate that a simplified method that trains only a subset of
attention weights, which we call A-CLIP, yields a balance between domain
generalization and catastrophic forgetting.
Related papers
- Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance.
We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features.
In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z) - Weakly-Supervised Cross-Domain Segmentation of Electron Microscopy with Sparse Point Annotation [1.124958340749622]
We introduce a multitask learning framework to leverage correlations among the counting, detection, and segmentation tasks.
We develop a cross-position cut-and-paste for label augmentation and an entropy-based pseudo-label selection.
The proposed model is capable of significantly outperforming UDA methods and produces comparable performance as the supervised counterpart.
arXiv Detail & Related papers (2024-03-31T12:22:23Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Task-guided Disentangled Tuning for Pretrained Language Models [16.429787408467703]
We propose Task-guided Disentangled Tuning (TDT) for pretrained language models (PLMs)
TDT enhances the generalization of representations by disentangling task-relevant signals from entangled representations.
Experimental results on GLUE and CLUE benchmarks show that TDT gives consistently better results than fine-tuning with different PLMs.
arXiv Detail & Related papers (2022-03-22T03:11:39Z) - A Multi-Task Deep Learning Framework for Building Footprint Segmentation [0.0]
We propose a joint optimization scheme for the task of building footprint delineation.
We also introduce two auxiliary tasks; image reconstruction and building footprint boundary segmentation.
In particular, we propose a deep multi-task learning (MTL) based unified fully convolutional framework.
arXiv Detail & Related papers (2021-04-19T15:07:27Z) - A Simple Baseline for Semi-supervised Semantic Segmentation with Strong
Data Augmentation [74.8791451327354]
We propose a simple yet effective semi-supervised learning framework for semantic segmentation.
A set of simple design and training techniques can collectively improve the performance of semi-supervised semantic segmentation significantly.
Our method achieves state-of-the-art results in the semi-supervised settings on the Cityscapes and Pascal VOC datasets.
arXiv Detail & Related papers (2021-04-15T06:01:39Z) - Supervised Contrastive Learning for Pre-trained Language Model
Fine-tuning [23.00300794016583]
State-of-the-art natural language understanding classification models follow two-stages.
We propose a supervised contrastive learning (SCL) objective for the fine-tuning stage.
Our proposed fine-tuning objective leads to models that are more robust to different levels of noise in the fine-tuning training data.
arXiv Detail & Related papers (2020-11-03T01:10:39Z) - Revisiting LSTM Networks for Semi-Supervised Text Classification via
Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results.
We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.