Related papers: Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

URL: http://arxiv.org/abs/2404.03862v3
Date: Thu, 14 Nov 2024 18:27:39 GMT
Title: Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Authors: Jingyu Zhang, Marc Marone, Tianjian Li, Benjamin Van Durme, Daniel Khashabi,
Abstract summary: We develop models that quote verbatim statements from trusted sources in their pre-training data. The core of Quote-Tuning is a fast membership inference function that efficiently verifies text against trusted corpora. Experiments show that Quote-Tuning significantly increases verbatim quotes from high-quality documents by up to 130% relative to base models.
Score: 48.409306245463
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To trust the fluent generations of large language models (LLMs), humans must be able to verify their correctness against trusted, external sources. Recent efforts, such as providing citations via retrieved documents or post-hoc provenance, enhance verifiability but provide no guarantees on their correctness. To address these limitations, we tackle the verifiability goal with a different philosophy: trivializing the verification process by developing models that quote verbatim statements from trusted sources in their pre-training data. We propose Quote-Tuning, which demonstrates the feasibility of aligning models to quote. The core of Quote-Tuning is a fast membership inference function that efficiently verifies text against trusted corpora. We leverage this tool to design a reward function to quantify quotes in model responses, and curate datasets for preference learning. Experiments show that Quote-Tuning significantly increases verbatim quotes from high-quality documents by up to 130% relative to base models while maintaining response quality. Quote-Tuning is applicable in different tasks, generalizes to out-of-domain data and diverse model families, and provides additional benefits to truthfulness. Our method not only serves as a hassle-free method to increase quoting but also opens up avenues for improving LLM trustworthiness through better verifiability.

Related papers

Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models [53.17363502535395]
Trustworthy language models should provide both correct and verifiable answers.<n>Current systems insert citations by querying an external retriever at inference time.<n>We propose Active Indexing, which continually pretrains on synthetic QA pairs.
arXiv Detail & Related papers (2025-06-21T04:48:05Z)
Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation [27.811765400370838]
We propose LoVeC (Long-form Verbalized Confidence), an on-the-fly verbalized confidence estimation method for long-form generation.<n>Specifically, we use reinforcement learning (RL) to train LLMs to append numerical confidence scores to each generated statement.<n>Our experiments show that our RL-trained models achieve better calibration and generalize robustly across domains.
arXiv Detail & Related papers (2025-05-29T18:05:20Z)
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning [25.37313017360707]
NOVER is a reinforcement learning framework that requires only standard supervised fine-tuning data with no need for an external verifier.<n>NOVER enables incentive training across a wide range of text-to-text tasks and outperforms the model of the same size distilled from large reasoning models such as DeepSeek R1 671B by 7.7 percent.
arXiv Detail & Related papers (2025-05-21T21:12:35Z)
Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning [84.94709351266557]
We focus on the trustworthiness of language models with respect to retrieval augmentation. We deem that retrieval-augmented language models have the inherent capabilities of supplying response according to both contextual and parametric knowledge. Inspired by aligning language models with human preference, we take the first step towards aligning retrieval-augmented language models to a status where it responds relying merely on the external evidence.
arXiv Detail & Related papers (2024-10-22T09:25:21Z)
Learning Fine-Grained Grounded Citations for Attributed Large Language Models [44.79328335487421]
Front is a training framework designed to teach large language models (LLMs) to generate Fine-Grained Grounded Citations. Experiments on the ALCE benchmark demonstrate the efficacy of FRONT in generating superior grounded responses and highly supportive citations.
arXiv Detail & Related papers (2024-08-08T16:28:22Z)
Learning to Generate Answers with Citations via Factual Consistency Models [28.716998866121923]
Large Language Models (LLMs) frequently hallucinate, impeding their reliability in mission-critical situations. This paper proposes a weakly-supervised fine-tuning method leveraging factual consistency models (FCMs) Focused learning is integrated into the objective, directing the fine-tuning process to emphasise the factual unit tokens.
arXiv Detail & Related papers (2024-06-19T00:40:19Z)
Verifiable Generation with Subsentence-Level Fine-Grained Citations [13.931548733211436]
Verifiable generation requires large language models to cite source documents supporting their outputs. Previous work mainly targets the generation of sentence-level citations, lacking specificity about which parts of a sentence are backed by the cited sources. This work studies verifiable generation with subsentence-level fine-grained citations for more precise location of generated content supported by the cited sources.
arXiv Detail & Related papers (2024-06-10T09:32:37Z)
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness [24.843692458375436]
This study investigates how models aligned with general-purpose preference data perform across five trustworthiness verticals. Our results demonstrate that RLHF on human preferences doesn't automatically guarantee trustworthiness, and reverse effects are often observed. We propose to adapt efficient influence function based data attribution methods to the RLHF setting to better understand the influence of fine-tuning data on individual trustworthiness benchmarks.
arXiv Detail & Related papers (2024-04-29T17:00:53Z)
Calibrating the Confidence of Large Language Models by Eliciting Fidelity [52.47397325111864]
Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. Post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. We propose a plug-and-play method to estimate the confidence of language models.
arXiv Detail & Related papers (2024-04-03T11:36:12Z)
Source-Aware Training Enables Knowledge Attribution in Language Models [81.13048060332775]
Intrinsic source citation can enhance transparency, interpretability, and verifiability. Our training recipe can enable faithful attribution to the pretraining data without a substantial impact on the model's perplexity.
arXiv Detail & Related papers (2024-04-01T09:39:38Z)
Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations' In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z)
Trusted Source Alignment in Large Language Models [30.14375102262399]
We present FactCheckQA, a TSA evaluation dataset based on a corpus of fact checking articles. We find that as we scale up the model size, the model performance on FactCheckQA improves from near-random to up to 80% balanced accuracy in aligning with trusted sources.
arXiv Detail & Related papers (2023-11-12T00:25:25Z)
Unsupervised Pretraining for Fact Verification by Language Model Distillation [4.504050940874427]
We propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised pretraining framework. It distils self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments.
arXiv Detail & Related papers (2023-09-28T15:53:44Z)
Context-Based Quotation Recommendation [60.93257124507105]
We propose a novel context-aware quote recommendation system. It generates a ranked list of quotable paragraphs and spans of tokens from a given source document. We conduct experiments on a collection of speech transcripts and associated news articles.
arXiv Detail & Related papers (2020-05-17T17:49:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.