Related papers: Quantified Task Misalignment to Inform PEFT: An Exploration of Domain Generalization and Catastrophic Forgetting in CLIP

Quantified Task Misalignment to Inform PEFT: An Exploration of Domain Generalization and Catastrophic Forgetting in CLIP

URL: http://arxiv.org/abs/2402.09613v1
Date: Wed, 14 Feb 2024 23:01:03 GMT
Title: Quantified Task Misalignment to Inform PEFT: An Exploration of Domain Generalization and Catastrophic Forgetting in CLIP
Authors: Laura Niss, Kevin Vogt-Lowell, Theodoros Tsiligkaridis
Abstract summary: We analyze the relation between task difficulty in the CLIP model and the performance of several simple parameter-efficient fine-tuning methods. A method that trains only a subset of attention weights, which we call A-CLIP, yields a balance between domain generalization and catastrophic forgetting.
Score: 7.550566004119157
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Foundations models are presented as generalists that often perform well over a myriad of tasks. Fine-tuning these models, even on limited data, provides an additional boost in task-specific performance but often at the cost of their wider generalization, an effect termed catastrophic forgetting. In this paper, we analyze the relation between task difficulty in the CLIP model and the performance of several simple parameter-efficient fine-tuning methods through the lens of domain generalization and catastrophic forgetting. We provide evidence that the silhouette score of the zero-shot image and text embeddings is a better measure of task difficulty than the average cosine similarity of correct image/label embeddings, and discuss observable relationships between task difficulty, fine-tuning method, domain generalization, and catastrophic forgetting. Additionally, the averaged results across tasks and performance measures demonstrate that a simplified method that trains only a subset of attention weights, which we call A-CLIP, yields a balance between domain generalization and catastrophic forgetting.

Related papers

Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance. We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features. In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z)
Weakly-Supervised Cross-Domain Segmentation of Electron Microscopy with Sparse Point Annotation [1.124958340749622]
We introduce a multitask learning framework to leverage correlations among the counting, detection, and segmentation tasks. We develop a cross-position cut-and-paste for label augmentation and an entropy-based pseudo-label selection. The proposed model is capable of significantly outperforming UDA methods and produces comparable performance as the supervised counterpart.
arXiv Detail & Related papers (2024-03-31T12:22:23Z)
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z)
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR) Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model. We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
Task-guided Disentangled Tuning for Pretrained Language Models [16.429787408467703]
We propose Task-guided Disentangled Tuning (TDT) for pretrained language models (PLMs) TDT enhances the generalization of representations by disentangling task-relevant signals from entangled representations. Experimental results on GLUE and CLUE benchmarks show that TDT gives consistently better results than fine-tuning with different PLMs.
arXiv Detail & Related papers (2022-03-22T03:11:39Z)
A Multi-Task Deep Learning Framework for Building Footprint Segmentation [0.0]
We propose a joint optimization scheme for the task of building footprint delineation. We also introduce two auxiliary tasks; image reconstruction and building footprint boundary segmentation. In particular, we propose a deep multi-task learning (MTL) based unified fully convolutional framework.
arXiv Detail & Related papers (2021-04-19T15:07:27Z)
A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation [74.8791451327354]
We propose a simple yet effective semi-supervised learning framework for semantic segmentation. A set of simple design and training techniques can collectively improve the performance of semi-supervised semantic segmentation significantly. Our method achieves state-of-the-art results in the semi-supervised settings on the Cityscapes and Pascal VOC datasets.
arXiv Detail & Related papers (2021-04-15T06:01:39Z)
Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning [23.00300794016583]
State-of-the-art natural language understanding classification models follow two-stages. We propose a supervised contrastive learning (SCL) objective for the fine-tuning stage. Our proposed fine-tuning objective leads to models that are more robust to different levels of noise in the fine-tuning training data.
arXiv Detail & Related papers (2020-11-03T01:10:39Z)
Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results. We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z)
Task-Feature Collaborative Learning with Application to Personalized Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL) Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks. As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.