Two-Stage Fine-Tuning: A Novel Strategy for Learning Class-Imbalanced
Data
- URL: http://arxiv.org/abs/2207.10858v1
- Date: Fri, 22 Jul 2022 03:39:51 GMT
- Title: Two-Stage Fine-Tuning: A Novel Strategy for Learning Class-Imbalanced
Data
- Authors: Taha ValizadehAslani, Yiwen Shi, Jing Wang, Ping Ren, Yi Zhang, Meng
Hu, Liang Zhao, Hualou Liang
- Abstract summary: Classification on long-tailed distributed data is a challenging problem.
Learning on tail classes is especially challenging for the fine-tuning when transferring a pretrained model to a downstream task.
We propose a two-stage fine-tuning: we first fine-tune the final layer of the pretrained model with class-balanced reweighting loss, and then we perform the standard fine-tuning.
- Score: 11.66734752179563
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classification on long-tailed distributed data is a challenging problem,
which suffers from serious class-imbalance and hence poor performance on tail
classes with only a few samples. Owing to this paucity of samples, learning on
the tail classes is especially challenging for the fine-tuning when
transferring a pretrained model to a downstream task. In this work, we present
a simple modification of standard fine-tuning to cope with these challenges.
Specifically, we propose a two-stage fine-tuning: we first fine-tune the final
layer of the pretrained model with class-balanced reweighting loss, and then we
perform the standard fine-tuning. Our modification has several benefits: (1) it
leverages pretrained representations by only fine-tuning a small portion of the
model parameters while keeping the rest untouched; (2) it allows the model to
learn an initial representation of the specific task; and importantly (3) it
protects the learning of tail classes from being at a disadvantage during the
model updating. We conduct extensive experiments on synthetic datasets of both
two-class and multi-class tasks of text classification as well as a real-world
application to ADME (i.e., absorption, distribution, metabolism, and excretion)
semantic labeling. The experimental results show that the proposed two-stage
fine-tuning outperforms both fine-tuning with conventional loss and fine-tuning
with a reweighting loss on the above datasets.
Related papers
- Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting [15.251425165987987]
Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities.
We propose a sample weighting scheme for the fine-tuning data based on the pre-trained model's losses.
We empirically demonstrate the efficacy of our method on both language and vision tasks.
arXiv Detail & Related papers (2025-02-05T00:49:59Z) - Prior2Posterior: Model Prior Correction for Long-Tailed Learning [0.41248472494152805]
We propose a novel approach to accurately model the effective prior of a trained model using textita posteriori probabilities.
We show that the proposed approach achieves new state-of-the-art (SOTA) on several benchmark datasets from the long-tail literature.
arXiv Detail & Related papers (2024-12-21T08:49:02Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views [28.081794908107604]
Fine-tuning is used to leverage the power of pre-trained foundation models in new downstream tasks.
Recent studies have observed challenges in the generalization of fine-tuned models to unseen distributions.
We propose a novel generalizable fine-tuning method LEVI, where the pre-trained model is adaptively ensembled layer-wise with a small task-specific model.
arXiv Detail & Related papers (2024-02-07T08:16:40Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Delving into Semantic Scale Imbalance [45.30062061215943]
We define and quantify the semantic scale of classes, which is used to measure the feature diversity of classes.
We propose semantic-scale-balanced learning, including a general loss improvement scheme and a dynamic re-weighting training framework.
Comprehensive experiments show that dynamic semantic-scale-balanced learning consistently enables the model to perform superiorly on large-scale long-tailed and non-long-tailed natural and medical datasets.
arXiv Detail & Related papers (2022-12-30T09:40:09Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - FairIF: Boosting Fairness in Deep Learning via Influence Functions with
Validation Set Sensitive Attributes [51.02407217197623]
We propose a two-stage training algorithm named FAIRIF.
It minimizes the loss over the reweighted data set where the sample weights are computed.
We show that FAIRIF yields models with better fairness-utility trade-offs against various types of bias.
arXiv Detail & Related papers (2022-01-15T05:14:48Z) - Mining Minority-class Examples With Uncertainty Estimates [102.814407678425]
In the real world, the frequency of occurrence of objects is naturally skewed forming long-tail class distributions.
We propose an effective, yet simple, approach to overcome these challenges.
Our framework enhances the subdued tail-class activations and, thereafter, uses a one-class data-centric approach to effectively identify tail-class examples.
arXiv Detail & Related papers (2021-12-15T02:05:02Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.