Related papers: Two-Stage Fine-Tuning: A Novel Strategy for Learning Class-Imbalanced Data

Two-Stage Fine-Tuning: A Novel Strategy for Learning Class-Imbalanced Data

URL: http://arxiv.org/abs/2207.10858v1
Date: Fri, 22 Jul 2022 03:39:51 GMT
Title: Two-Stage Fine-Tuning: A Novel Strategy for Learning Class-Imbalanced Data
Authors: Taha ValizadehAslani, Yiwen Shi, Jing Wang, Ping Ren, Yi Zhang, Meng Hu, Liang Zhao, Hualou Liang
Abstract summary: Classification on long-tailed distributed data is a challenging problem. Learning on tail classes is especially challenging for the fine-tuning when transferring a pretrained model to a downstream task. We propose a two-stage fine-tuning: we first fine-tune the final layer of the pretrained model with class-balanced reweighting loss, and then we perform the standard fine-tuning.
Score: 11.66734752179563
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Classification on long-tailed distributed data is a challenging problem, which suffers from serious class-imbalance and hence poor performance on tail classes with only a few samples. Owing to this paucity of samples, learning on the tail classes is especially challenging for the fine-tuning when transferring a pretrained model to a downstream task. In this work, we present a simple modification of standard fine-tuning to cope with these challenges. Specifically, we propose a two-stage fine-tuning: we first fine-tune the final layer of the pretrained model with class-balanced reweighting loss, and then we perform the standard fine-tuning. Our modification has several benefits: (1) it leverages pretrained representations by only fine-tuning a small portion of the model parameters while keeping the rest untouched; (2) it allows the model to learn an initial representation of the specific task; and importantly (3) it protects the learning of tail classes from being at a disadvantage during the model updating. We conduct extensive experiments on synthetic datasets of both two-class and multi-class tasks of text classification as well as a real-world application to ADME (i.e., absorption, distribution, metabolism, and excretion) semantic labeling. The experimental results show that the proposed two-stage fine-tuning outperforms both fine-tuning with conventional loss and fine-tuning with a reweighting loss on the above datasets.

Related papers

Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting [15.251425165987987]
Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities. We propose a sample weighting scheme for the fine-tuning data based on the pre-trained model's losses. We empirically demonstrate the efficacy of our method on both language and vision tasks.
arXiv Detail & Related papers (2025-02-05T00:49:59Z)
Prior2Posterior: Model Prior Correction for Long-Tailed Learning [0.41248472494152805]
We propose a novel approach to accurately model the effective prior of a trained model using textita posteriori probabilities. We show that the proposed approach achieves new state-of-the-art (SOTA) on several benchmark datasets from the long-tail literature.
arXiv Detail & Related papers (2024-12-21T08:49:02Z)
Fine-Tuning is Fine, if Calibrated [33.42198023647517]
Fine-tuning a pre-trained model is shown to drastically degrade the model's accuracy in the other classes it had previously learned. This paper systematically dissects the issue, aiming to answer the fundamental question, "What has been damaged in the fine-tuned model?" We find that the fine-tuned model neither forgets the relationship among the other classes nor degrades the features to recognize these classes.
arXiv Detail & Related papers (2024-09-24T16:35:16Z)
Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models. This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution. We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z)
LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views [28.081794908107604]
Fine-tuning is used to leverage the power of pre-trained foundation models in new downstream tasks. Recent studies have observed challenges in the generalization of fine-tuned models to unseen distributions. We propose a novel generalizable fine-tuning method LEVI, where the pre-trained model is adaptively ensembled layer-wise with a small task-specific model.
arXiv Detail & Related papers (2024-02-07T08:16:40Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
Overwriting Pretrained Bias with Finetuning Data [36.050345384273655]
We investigate bias when conceptualized as both spurious correlations between the target task and a sensitive attribute as well as underrepresentation of a particular group in the dataset. We find that models finetuned on top of pretrained models can indeed inherit their biases, but (2) this bias can be corrected for through relatively minor interventions to the finetuning dataset. Our findings imply that careful curation of the finetuning dataset is important for reducing biases on a downstream task, and doing so can even compensate for bias in the pretrained model.
arXiv Detail & Related papers (2023-03-10T19:10:58Z)
Delving into Semantic Scale Imbalance [45.30062061215943]
We define and quantify the semantic scale of classes, which is used to measure the feature diversity of classes. We propose semantic-scale-balanced learning, including a general loss improvement scheme and a dynamic re-weighting training framework. Comprehensive experiments show that dynamic semantic-scale-balanced learning consistently enables the model to perform superiorly on large-scale long-tailed and non-long-tailed natural and medical datasets.
arXiv Detail & Related papers (2022-12-30T09:40:09Z)
CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance. Sample re-weighting methods are popularly used to alleviate this data bias issue. We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z)
FairIF: Boosting Fairness in Deep Learning via Influence Functions with Validation Set Sensitive Attributes [51.02407217197623]
We propose a two-stage training algorithm named FAIRIF. It minimizes the loss over the reweighted data set where the sample weights are computed. We show that FAIRIF yields models with better fairness-utility trade-offs against various types of bias.
arXiv Detail & Related papers (2022-01-15T05:14:48Z)
Mining Minority-class Examples With Uncertainty Estimates [102.814407678425]
In the real world, the frequency of occurrence of objects is naturally skewed forming long-tail class distributions. We propose an effective, yet simple, approach to overcome these challenges. Our framework enhances the subdued tail-class activations and, thereafter, uses a one-class data-centric approach to effectively identify tail-class examples.
arXiv Detail & Related papers (2021-12-15T02:05:02Z)
Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models. Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers. We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.