Efficient Layer-wise LLM Fine-tuning for Revision Intention Prediction
- URL: http://arxiv.org/abs/2510.00268v1
- Date: Tue, 30 Sep 2025 20:42:13 GMT
- Title: Efficient Layer-wise LLM Fine-tuning for Revision Intention Prediction
- Authors: Zhexiong Liu, Diane Litman,
- Abstract summary: Large Language Models (LLMs) have shown extraordinary success across various text generation tasks.<n>However, their potential for simple yet essential text classification remains underexplored.<n>We introduce a plug-and-play layer-wise parameter-efficient fine-tuning framework, i.e., IR-Tuning.
- Score: 0.70303436819479
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have shown extraordinary success across various text generation tasks; however, their potential for simple yet essential text classification remains underexplored, as LLM pre-training tends to emphasize generation over classification. While LLMs with instruction tuning can transform classification into a generation task, they often struggle to categorize nuanced texts. One such example is text revision, which involves nuanced edits between pairs of texts. Although simply fine-tuning LLMs for revision classification seems plausible, it requires a large amount of revision annotations, which are exceptionally expensive and scarce in the community. To address this issue, we introduce a plug-and-play layer-wise parameter-efficient fine-tuning (PEFT) framework, i.e., IR-Tuning, which fine-tunes a subset of important LLM layers that are dynamically selected based on their gradient norm distribution, while freezing those of redundant layers. Extensive experiments suggest that IR-Tuning surpasses several layer-wise PEFT baselines over diverse text revisions, while achieving fast convergence, low GPU memory consumption, and effectiveness on small revision corpora.
Related papers
- Intention-Adaptive LLM Fine-Tuning for Text Revision Generation [0.70303436819479]
Large Language Models (LLMs) have achieved impressive capabilities in context-based text generation tasks.<n>We propose Intention-Tuning, an intention-adaptive layer-wise LLM fine-tuning framework.<n>We show that Intention-Tuning is effective and efficient on small revision corpora.
arXiv Detail & Related papers (2026-01-31T03:01:09Z) - How Do LLM-Generated Texts Impact Term-Based Retrieval Models? [76.92519309816008]
This paper investigates the influence of large language models (LLMs) on term-based retrieval models.<n>Our linguistic analysis reveals that LLM-generated texts exhibit smoother high-frequency and steeper low-frequency Zipf slopes.<n>Our study further explores whether term-based retrieval models demonstrate source bias, concluding that these models prioritize documents whose term distributions closely correspond to those of the queries.
arXiv Detail & Related papers (2025-08-25T06:43:27Z) - Integrating Time Series into LLMs via Multi-layer Steerable Embedding Fusion for Enhanced Forecasting [44.91360223102709]
Time series (TS) data are ubiquitous across various application areas, rendering time series forecasting (TSF) a fundamental task.<n>Existing methods are inherently constrained by their shallow integration of TS information.<n>We propose the Multi-layer Steerable Embedding Fusion (MSEF) to mitigate the progressive loss of TS information in deeper layers.
arXiv Detail & Related papers (2025-08-22T03:22:10Z) - Scaling Textual Gradients via Sampling-Based Momentum [59.94928977345951]
The Textual Gradient Descent (TGD) framework has emerged as a promising data-driven approach.<n> scaling the number of training examples improves results but later degrades TGD's performance.<n>We propose Textual Gradient Descent with Momentum (TSGD-M) - a method that facilitates scalable-context learning by reweighting prompt sampling.
arXiv Detail & Related papers (2025-05-31T05:35:45Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - LLMEmbed: Rethinking Lightweight LLM's Genuine Function in Text Classification [13.319594321038926]
We propose a simple and effective transfer learning strategy, namely LLMEmbed, to address this classical but challenging task.
We perform extensive experiments on publicly available datasets, and the results show that LLMEmbed achieves strong performance while enjoys low training overhead.
arXiv Detail & Related papers (2024-06-06T03:46:59Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)<n>We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping [49.66872823080736]
Autoregressive Large Language Models (e.g., LLaMa, GPTs) are omnipresent achieving remarkable success in language understanding and generation.
To mitigate overload incurred during generation, several early-exit and layer-dropping strategies have been proposed.
We propose FFN-SkipLLM, which is an input-adaptive feed-forward skipping strategy.
arXiv Detail & Related papers (2024-04-05T02:35:43Z) - Streamlining Redundant Layers to Compress Large Language Models [21.27944103424621]
This paper introduces LLM-Streamline, a pioneer work on layer pruning for large language models (LLMs)<n>It is based on the observation that different layers have varying impacts on hidden states, enabling the identification of less important layers to be pruned.<n>Experiments show that LLM-Streamline outperforms both previous and concurrent state-of-the-art pruning methods in terms of both performance and training efficiency.
arXiv Detail & Related papers (2024-03-28T04:12:13Z) - Pushing The Limit of LLM Capacity for Text Classification [27.684335455517417]
We propose RGPT, an adaptive boosting framework tailored to produce a specialized text classification LLM.
We show that RGPT significantly outperforms 8 SOTA PLMs and 7 SOTA LLMs on four benchmarks by 1.36% on average.
arXiv Detail & Related papers (2024-02-12T08:14:03Z) - SPIN: Sparsifying and Integrating Internal Neurons in Large Language Models for Text Classification [6.227343685358882]
We present a model-agnostic framework that sparsifies and integrates internal neurons of intermediate layers of Large Language Models for text classification.
SPIN significantly improves text classification accuracy, efficiency, and interpretability.
arXiv Detail & Related papers (2023-11-27T16:28:20Z) - CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of
Pre-trained Language Models [59.49705076369856]
We introduce a novel framework to improve the fine-tuning phase of pre-trained language models (PLMs)
We retrieve positive and negative instances from large-scale unlabeled corpora according to their domain-level and class-level semantic relatedness to a task.
We then perform contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances to help PLMs capture crucial task-related semantic features.
arXiv Detail & Related papers (2021-02-07T09:27:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.