Collaborative decoding of critical tokens for boosting factuality of
large language models
- URL: http://arxiv.org/abs/2402.17982v1
- Date: Wed, 28 Feb 2024 01:53:37 GMT
- Title: Collaborative decoding of critical tokens for boosting factuality of
large language models
- Authors: Lifeng Jin, Baolin Peng, Linfeng Song, Haitao Mi, Ye Tian and Dong Yu
- Abstract summary: Finetuned and aligned models show improved abilities of instruction following and safe generation.
The common practice of using sampling during generation also increases chances of hallucination.
We introduce a collaborative decoding framework to harness the high factuality within pretrained models through the concept of critical tokens.
- Score: 57.504894664689
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The most common training pipeline for large language models includes
pretraining, finetuning and aligning phases, with their respective resulting
models, such as the pretrained model and the finetuned model. Finetuned and
aligned models show improved abilities of instruction following and safe
generation, however their abilities to stay factual about the world are
impacted by the finetuning process. Furthermore, the common practice of using
sampling during generation also increases chances of hallucination. In this
work, we introduce a collaborative decoding framework to harness the high
factuality within pretrained models through the concept of critical tokens. We
first design a critical token classifier to decide which model to use for the
next token, and subsequently generates the next token using different decoding
strategies. Experiments with different models and datasets show that our
decoding framework is able to reduce model hallucination significantly,
showcasing the importance of the collaborative decoding framework.
Related papers
- Code Pretraining Improves Entity Tracking Abilities of Language Models [20.6768931196215]
We find clear evidence that models additionally trained on large amounts of code outperform the base models.
On the other hand, we find no consistent benefit of additional math training or alignment tuning across various model families.
arXiv Detail & Related papers (2024-05-31T17:56:33Z) - Code Representation Learning At Scale [75.04686476303436]
We fuel code representation learning with a vast amount of code data via a two-stage pretraining scheme.
We first train the encoders via a mix that leverages both randomness in masking language modeling and the structure aspect of programming language.
We then enhance the representations via contrastive learning with hard negative and hard positive constructed in an unsupervised manner.
arXiv Detail & Related papers (2024-02-02T22:19:15Z) - Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - What is the best recipe for character-level encoder-only modelling? [2.792030485253753]
This paper aims to benchmark recent progress in language understanding models that output contextualised representations at the character level.
We find that our best performing character-level model exceeds the performance of a token-based model trained with the same settings on the same data.
We believe our results demonstrate the readiness of character-level models for multilingual language representation, and encourage NLP practitioners to try them as drop-in replacements for token-based models.
arXiv Detail & Related papers (2023-05-09T14:00:15Z) - A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained
Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings.
We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z) - Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models.
A collection of pretrained encoders perceive diverse modalities (such as vision, and language)
We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z) - What Language Model Architecture and Pretraining Objective Work Best for
Zero-Shot Generalization? [50.84738303888189]
We present a large-scale evaluation of modeling choices and their impact on zero-shot generalization.
We train models with over 5 billion parameters for more than 170 billion tokens.
We find that pretrained causal decoder models can be efficiently adapted into non-causal decoder models.
arXiv Detail & Related papers (2022-04-12T14:19:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.