Enhancing Instruction-Following Capabilities in Seq2Seq Models: DoLA Adaptations for T5
- URL: http://arxiv.org/abs/2512.03803v1
- Date: Wed, 03 Dec 2025 13:54:11 GMT
- Title: Enhancing Instruction-Following Capabilities in Seq2Seq Models: DoLA Adaptations for T5
- Authors: Huey Sun, Anabel Yong, Lorenzo Gilly, Felipe Jin,
- Abstract summary: This work adapts DoLa for the T5 and FLAN-T5 model families and evaluates its impact on the models' instruction following capabilities.<n>Our results show that DoLa improves the faithfulness of text generation for certain categories of tasks and harms others.<n>To understand these results, we present a layer-by-layer analysis of logit evolution in a FLAN-T5 model to quantify DoLa's impact on token output probabilities.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contrastive decoding is a lightweight and effective inference-time method that improves the quality of text generation in Large Language Models. However, algorithms such as DoLa (Decoding by Contrastive Layers) have only been implemented in decoder-only architectures and studied for their impact on improving factuality. This work adapts DoLa for the T5 and FLAN-T5 model families and evaluates its impact on the models' instruction following capabilities, which to our knowledge is the first implementation of a contrastive decoding strategy in an encoder-decoder architecture. Our results show that DoLa improves the faithfulness of text generation for certain categories of tasks and harms others. To understand these results, we present a layer-by-layer analysis of logit evolution in a FLAN-T5 model to quantify DoLa's impact on token output probabilities.
Related papers
- Readability-Robust Code Summarization via Meta Curriculum Learning [53.44612630063336]
In the real world, code is often poorly structured or obfuscated, significantly degrading model performance.<n>We propose RoFTCodeSum, a novel fine-tuning method that enhances the robustness of code summarization against poorly readable code.
arXiv Detail & Related papers (2026-01-09T02:38:24Z) - LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers [53.43862310647276]
Large language models (LLMs) excel at natural language understanding and generation but remain vulnerable to factual errors.<n>We introduce a token-aware, layer-localized contrastive decoding method that aligns specific token types with their most influential transformer layers to improve factual generation.<n>Our method requires no additional training or model modification, and experiments demonstrate that our method consistently improves factuality across multiple LLMs and various benchmarks.
arXiv Detail & Related papers (2025-07-06T14:35:43Z) - Emergence and Effectiveness of Task Vectors in In-Context Learning: An Encoder Decoder Perspective [18.077009146950473]
We study how transformers form task vectors during pretraining and how their task encoding quality predicts ICL task performance.<n>Our empirical insights shed light into better understanding the success and failure modes of large language models via their representations.
arXiv Detail & Related papers (2024-12-16T19:00:18Z) - Efficient Sample-Specific Encoder Perturbations [37.84914870036184]
We show that a small proxy network can be used to find a sample-by-sample perturbation of the encoder output of a frozen foundation model.
Results show consistent improvements in performance evaluated through COMET and WER respectively.
arXiv Detail & Related papers (2024-05-01T08:55:16Z) - A Thorough Examination of Decoding Methods in the Era of LLMs [72.65956436513241]
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers.
This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of large language models.
Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization.
arXiv Detail & Related papers (2024-02-10T11:14:53Z) - Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via
Instruction Tuning with LITE [62.13435256279566]
Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks.
However, their large size makes their inference slow and computationally expensive.
We show that it enables these layers to acquire 'good' generation ability without affecting the generation ability of the final layer.
arXiv Detail & Related papers (2023-10-28T04:07:58Z) - Contrastive Decoding Improves Reasoning in Large Language Models [55.16503283583076]
We show that Contrastive Decoding achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks.
We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark.
arXiv Detail & Related papers (2023-09-17T00:29:32Z) - Evaluating the Robustness to Instructions of Large Language Models [6.947956990248856]
Fine-tuning Large Language Models (LLMs) can boost their zero-shot capabilities on novel tasks.
We evaluate six models including Alpaca, Vicuna, WizardLM, and Traditional Task-oriented Models(Flan-T5-XL/XXL, T0++)
We find that the robustness of different scales of FLAN-T5 models to RE instruction is worse than the robustness to QA instruction.
arXiv Detail & Related papers (2023-08-28T04:57:07Z) - Controlled Text Generation using T5 based Encoder-Decoder Soft Prompt
Tuning and Analysis of the Utility of Generated Text in AI [2.381686610905853]
We introduce the novel soft prompt tuning method of using soft prompts at both encoder and decoder levels together in a T5 model.
We also investigate the feasibility of steering the output of this extended soft prompted T5 model at decoder level.
arXiv Detail & Related papers (2022-12-06T12:31:53Z) - Non-Autoregressive Translation by Learning Target Categorical Codes [59.840510037250944]
We propose CNAT, which learns implicitly categorical codes as latent variables into the non-autoregressive decoding.
Experiment results show that our model achieves comparable or better performance in machine translation tasks.
arXiv Detail & Related papers (2021-03-21T14:12:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.