RepCali: High Efficient Fine-tuning Via Representation Calibration in Latent Space for Pre-trained Language Models
- URL: http://arxiv.org/abs/2505.08463v2
- Date: Thu, 29 May 2025 05:01:48 GMT
- Title: RepCali: High Efficient Fine-tuning Via Representation Calibration in Latent Space for Pre-trained Language Models
- Authors: Fujun Zhang, Xiaoying Fan, XiangDong Su, Guanglai Gao,
- Abstract summary: Fine-tuning pre-trained language models (PLMs) has become a dominant paradigm in applying PLMs to downstream tasks.<n>This paper tackles this challenge by learning to calibrate the representation of PLMs in the latent space.<n>In the proposed representation calibration method (RepCali), we integrate a specific calibration block to the latent space after the encoder and use the calibrated output as the decoder input.
- Score: 14.214116482595461
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning pre-trained language models (PLMs) has become a dominant paradigm in applying PLMs to downstream tasks. However, with limited fine-tuning, PLMs still struggle with the discrepancies between the representation obtained from the PLMs' encoder and the optimal input to the PLMs' decoder. This paper tackles this challenge by learning to calibrate the representation of PLMs in the latent space. In the proposed representation calibration method (RepCali), we integrate a specific calibration block to the latent space after the encoder and use the calibrated output as the decoder input. The merits of the proposed RepCali include its universality to all PLMs with encoder-decoder architectures, its plug-and-play nature, and ease of implementation. Extensive experiments on 25 PLM-based models across 8 tasks (including both English and Chinese datasets) demonstrate that the proposed RepCali offers desirable enhancements to PLMs (including LLMs) and significantly improves the performance of downstream tasks. Comparison experiments across 4 benchmark tasks indicate that RepCali is superior to the representative fine-tuning baselines.
Related papers
- Flipping Knowledge Distillation: Leveraging Small Models' Expertise to Enhance LLMs in Text Matching [16.725632407644884]
We introduce a flipped knowledge distillation paradigm, where a Large Language Model learns from a Smaller Language Model.<n>Specifically, we address the architectural gap between decoder-only LLMs and smaller encoder-based models.<n> Experiments on financial and healthcare benchmarks, as well as real-world applications, confirm its effectiveness.
arXiv Detail & Related papers (2025-07-08T02:54:15Z) - Accelerating Diffusion LLMs via Adaptive Parallel Decoding [50.9948753314669]
We introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel.<n>APD provides markedly higher throughput with minimal quality degradations on downstream benchmarks.
arXiv Detail & Related papers (2025-05-31T06:10:10Z) - An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning [52.29223403698673]
This paper examines the use of Conformal Language Modelling (CLM) alongside Answer Set Programming (ASP)<n>We apply CLM to generate sets of ASP programs from an LLM, providing statistical guarantees on the correctness of the outputs.<n> Experimental results show that CLM significantly outperforms baseline models that use standard sampling methods.
arXiv Detail & Related papers (2025-03-07T14:10:10Z) - Better Instruction-Following Through Minimum Bayes Risk [48.879360919760074]
General-purpose LLM judges capable of human-level evaluation provide a scalable and accurate way of evaluating instruction-following LLMs.<n>One promising way of leveraging LLM judges for supervision is through Minimum Bayes Risk (MBR) decoding.<n>MBR decoding uses a reference-based evaluator to select a high-quality output from amongst a set of candidate outputs.
arXiv Detail & Related papers (2024-10-03T18:48:38Z) - Multi-Prompting Decoder Helps Better Language Understanding [23.084538462710125]
We propose a simple yet effective Multi-Prompting Decoder (MPD) framework for MaaS adaptation.
Our method achieves new state-of-the-art results on multiple natural language understanding datasets under the few-shot setting.
arXiv Detail & Related papers (2024-06-10T13:58:46Z) - CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences [5.165576022684194]
We propose using the LLM-as-a-Judge methodology to evaluate the alignment of LLMs with coding preferences.<n>CodeUltraFeedback consists of 10,000 coding instructions, each annotated with four responses generated from a diverse pool of 14 LLMs.<n>In turn, we explore the usage of CodeUltraFeedback as feedback data to fine-tune and align CodeLlama-7B-Instruct using supervised fine-tuning (SFT) and reinforcement learning from AI feedback (RLAIF) with direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-14T01:51:35Z) - On Leveraging Encoder-only Pre-trained Language Models for Effective
Keyphrase Generation [76.52997424694767]
This study addresses the application of encoder-only Pre-trained Language Models (PLMs) in keyphrase generation (KPG)
With encoder-only PLMs, although KPE with Conditional Random Fields slightly excels in identifying present keyphrases, the KPG formulation renders a broader spectrum of keyphrase predictions.
We also identify a favorable parameter allocation towards model depth rather than width when employing encoder-decoder architectures with encoder-only PLMs.
arXiv Detail & Related papers (2024-02-21T18:57:54Z) - Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM
Inference with Transferable Prompt [96.24800696597707]
We introduce a new perspective to optimize this trade-off by prompting compressed models.
We propose a soft prompt learning method where we expose the compressed model to the prompt learning process.
Our experimental analysis suggests our soft prompt strategy greatly improves the performance of the 8x compressed LLaMA-7B model.
arXiv Detail & Related papers (2023-05-17T20:45:13Z) - Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed
Representations [51.75960511842552]
Fine-tuning of pretrained language models (PLMs) is prone to overfitting in the low resource scenarios.
We present a novel method that operates on the hidden representations of a PLM to reduce overfitting.
arXiv Detail & Related papers (2022-11-16T09:39:29Z) - Prompt Tuning for Discriminative Pre-trained Language Models [96.04765512463415]
Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks.
It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned.
We present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem.
arXiv Detail & Related papers (2022-05-23T10:11:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.