Related papers: LIFT+: Lightweight Fine-Tuning for Long-Tail Learning

LIFT+: Lightweight Fine-Tuning for Long-Tail Learning

URL: http://arxiv.org/abs/2504.13282v1
Date: Thu, 17 Apr 2025 18:50:47 GMT
Title: LIFT+: Lightweight Fine-Tuning for Long-Tail Learning
Authors: Jiang-Xin Shi, Tong Wei, Yu-Feng Li,
Abstract summary: LIFT+ is an innovative lightweight fine-tuning framework to optimize consistent class conditions.<n>Our framework provides an efficient and accurate pipeline that facilitates fast convergence and model compactness.
Score: 45.187004699024435
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The fine-tuning paradigm has emerged as a prominent approach for addressing long-tail learning tasks in the era of foundation models. However, the impact of fine-tuning strategies on long-tail learning performance remains unexplored. In this work, we disclose that existing paradigms exhibit a profound misuse of fine-tuning methods, leaving significant room for improvement in both efficiency and accuracy. Specifically, we reveal that heavy fine-tuning (fine-tuning a large proportion of model parameters) can lead to non-negligible performance deterioration on tail classes, whereas lightweight fine-tuning demonstrates superior effectiveness. Through comprehensive theoretical and empirical validation, we identify this phenomenon as stemming from inconsistent class conditional distributions induced by heavy fine-tuning. Building on this insight, we propose LIFT+, an innovative lightweight fine-tuning framework to optimize consistent class conditions. Furthermore, LIFT+ incorporates semantic-aware initialization, minimalist data augmentation, and test-time ensembling to enhance adaptation and generalization of foundation models. Our framework provides an efficient and accurate pipeline that facilitates fast convergence and model compactness. Extensive experiments demonstrate that LIFT+ significantly reduces both training epochs (from $\sim$100 to $\leq$15) and learned parameters (less than 1%), while surpassing state-of-the-art approaches by a considerable margin. The source code is available at https://github.com/shijxcs/LIFT-plus.

Related papers

Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections [65.36449542323277]
We present a unified theoretical framework bridgingSupervised Fine-Tuning (SFT) and preference learning in Large Language Model (LLM) post-training.<n>We propose a simple yet effective learning rate reduction approach that yields significant performance improvements.
arXiv Detail & Related papers (2025-06-15T05:42:29Z)
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning [32.86747945245703]
supervised fine-tuning of LLMs can yield strong reasoning capabilities.<n>Full fine-tuning (Full FT) is computationally expensive and susceptible to overfitting and forgetting.<n>Sparse fine-tuning, which previously achieved notable success, offers a promising trade-off between efficiency and effectiveness.
arXiv Detail & Related papers (2025-06-01T01:31:50Z)
HFT: Half Fine-Tuning for Large Language Models [42.60438623804577]
Large language models (LLMs) with one or more fine-tuning phases have become a necessary step to unlock various capabilities. In this paper, we find that by regularly resetting partial parameters, LLMs can restore some of the original knowledge. We introduce Half Fine-Tuning (HFT) for LLMs, as a substitute for full fine-tuning (FFT), to mitigate the forgetting issues.
arXiv Detail & Related papers (2024-04-29T07:07:58Z)
FeTrIL++: Feature Translation for Exemplar-Free Class-Incremental Learning with Hill-Climbing [3.533544633664583]
Exemplar-free class-incremental learning (EFCIL) poses significant challenges, primarily due to catastrophic forgetting. Traditional EFCIL approaches typically skew towards either model plasticity through successive fine-tuning or stability. This paper builds upon the foundational FeTrIL framework to examine the efficacy of various oversampling techniques and dynamic optimization strategies.
arXiv Detail & Related papers (2024-03-12T08:34:05Z)
Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT) We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z)
Federated Learning of Large Language Models with Parameter-Efficient Prompt Tuning and Adaptive Optimization [71.87335804334616]
Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data. The training process of Large Language Models (LLMs) generally incurs the update of significant parameters. This paper proposes an efficient partial prompt tuning approach to improve performance and efficiency simultaneously.
arXiv Detail & Related papers (2023-10-23T16:37:59Z)
Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts [42.693469918949006]
In this paper, we disclose that heavy fine-tuning may even lead to non-negligible performance deterioration on tail classes. We develop a low-complexity and accurate long-tail learning algorithms LIFT with the goal of facilitating fast prediction and compact models.
arXiv Detail & Related papers (2023-09-18T17:50:56Z)
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.