Related papers: Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models

Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models

URL: http://arxiv.org/abs/2404.09338v1
Date: Sun, 14 Apr 2024 19:45:35 GMT
Title: Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models
Authors: Souvik Das, Lifeng Jin, Linfeng Song, Haitao Mi, Baolin Peng, Dong Yu,
Abstract summary: Large language models (LLMs) exhibit impressive natural language capabilities but suffer from hallucination. Recent work has focused on decoding techniques to improve factuality during inference.
Score: 55.45444773200529
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) exhibit impressive natural language capabilities but suffer from hallucination -- generating content ungrounded in the realities of training data. Recent work has focused on decoding techniques to improve factuality during inference by leveraging LLMs' hierarchical representation of factual knowledge, manipulating the predicted distributions at inference time. Current state-of-the-art approaches refine decoding by contrasting early-exit distributions from a lower layer with the final layer to exploit information related to factuality within the model forward procedure. However, such methods often assume the final layer is the most reliable and the lower layer selection process depends on it. In this work, we first propose extrapolation of critical token probabilities beyond the last layer for more accurate contrasting. We additionally employ layer-wise entropy-guided lower layer selection, decoupling the selection process from the final layer. Experiments demonstrate strong performance - surpassing state-of-the-art on multiple different datasets by large margins. Analyses show different kinds of prompts respond to different selection strategies.

Related papers

LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z)
Layer by Layer: Uncovering Hidden Representations in Language Models [28.304269706993942]
We show that intermediate layers can encode even richer representations, often improving performance on a wide range of downstream tasks. Our framework highlights how each model layer balances information compression and signal preservation. These findings challenge the standard focus on final-layer embeddings and open new directions for model analysis and optimization.
arXiv Detail & Related papers (2025-02-04T05:03:42Z)
InnerThoughts: Disentangling Representations and Predictions in Large Language Models [20.39568933276831]
We propose to learn a small separate neural network predictor module on a collection of training questions. In effect, such a framework disentangles the representational abilities of LLMs from their predictive abilities.
arXiv Detail & Related papers (2025-01-29T21:01:44Z)
Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused [44.37155553647802]
Large Language Models (LLMs) have demonstrated exceptional performance across various natural language processing tasks. They occasionally yield content that factually inaccurate or discordant with the expected output. Recent works have investigated contrastive decoding between the original model and an amateur model with induced hallucination. We introduce a novel contrastive decoding framework termed LOL (LOwer Layer Matters)
arXiv Detail & Related papers (2024-08-16T14:23:59Z)
Debiasing Multimodal Large Language Models [61.6896704217147]
Large Vision-Language Models (LVLMs) have become indispensable tools in computer vision and natural language processing. Our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior to the input image. To rectify these biases and redirect the model's focus toward vision information, we introduce two simple, training-free strategies.
arXiv Detail & Related papers (2024-03-08T12:35:07Z)
Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with LITE [62.13435256279566]
Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks. However, their large size makes their inference slow and computationally expensive. We show that it enables these layers to acquire 'good' generation ability without affecting the generation ability of the final layer.
arXiv Detail & Related papers (2023-10-28T04:07:58Z)
The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers [14.596485032985328]
We argue that due to the gradual improvement across model layers, additional information can be gleaned from the contrast between higher and lower layers during inference. We propose a novel approach that utilizes the contrast between layers to improve text generation outputs.
arXiv Detail & Related papers (2023-05-02T17:42:37Z)
Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond [52.656743602538825]
Fine-tuning pre-trained code models incurs a large computational cost. We conduct an experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning. We propose Telly to efficiently fine-tune pre-trained code models via layer freezing.
arXiv Detail & Related papers (2023-04-11T13:34:13Z)
Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively [32.001304911395756]
We propose a Dynamic Selection (DPS) algorithm for the large-scale pre-trained models during fine-tuning. Experiments on the GLUE benchmark show that DPS outperforms previous fine-tuning methods in terms of overall performance and stability.
arXiv Detail & Related papers (2022-11-03T08:32:12Z)
Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss. Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z)
Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text. These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining. We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z)
SuperDeConFuse: A Supervised Deep Convolutional Transform based Fusion Framework for Financial Trading Systems [29.411173536818477]
This work proposes a supervised multi-channel time-series learning framework for financial stock trading. Our approach consists of processing the data channels through separate 1-D convolution layers, then fusing the outputs with a series of fully-connected layers, and finally applying a softmax classification layer. Numerical experiments confirm that the proposed model yields considerably better results than state-of-the-art deep learning techniques for real-world problem of stock trading.
arXiv Detail & Related papers (2020-11-09T11:58:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.