Related papers: Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability

Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability

URL: http://arxiv.org/abs/2401.08574v2
Date: Wed, 26 Jun 2024 19:52:35 GMT
Title: Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability
Authors: Afra Feyza Akyürek, Ekin Akyürek, Leshem Choshen, Derry Wijaya, Jacob Andreas,
Abstract summary: Language models (LMs) can sometimes generate factually correct text and estimate truth values of individual claims. Current LMs generate incorrect or nonsensical content, and are difficult to edit and bring up to date. We present a method called Deductive Closure Training (DCT) that uses LMs themselves to identify implications of (and contradictions within) the text that they generate.
Score: 58.582216812183496
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While language models (LMs) can sometimes generate factually correct text and estimate truth values of individual claims, these generally do not reflect a globally coherent, manipulable model of the world. As a consequence, current LMs also generate incorrect or nonsensical content, and are difficult to edit and bring up to date. We present a method called Deductive Closure Training (DCT) that uses LMs themselves to identify implications of (and contradictions within) the text that they generate, yielding an efficient self-supervised procedure for improving LM factuality. Given a collection of seed documents, DCT prompts LMs to generate additional text implied by these documents, reason globally about the correctness of this generated text, and finally fine-tune on text inferred to be correct. Given seed documents from a trusted source, DCT provides a tool for supervised model updating; if seed documents are sampled from the LM itself, DCT enables fully unsupervised fine-tuning for improved coherence and accuracy. Across the CREAK, MQUaKE, and Reversal Curse datasets, supervised DCT improves LM fact verification and text generation accuracy by 3-26%; on CREAK fully unsupervised DCT improves verification accuracy by 12%. These results show that LMs' reasoning capabilities during inference can be leveraged during training to improve their reliability.

Related papers

Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond [55.984684518346924]
We recast Knowledge Tracing as an inverse problem: learning the minimum natural-language summary that makes past answers explainable and future answers predictable.<n>Our Language Bottleneck Model (LBM) consists of an encoder LLM that writes an interpretable knowledge summary and a frozen decoder LLM that must reconstruct and predict student responses using only that summary text.<n> Experiments on synthetic arithmetic benchmarks and the large-scale Eedi dataset show that LBMs rival the accuracy of state-of-the-art KT and direct LLM methods while requiring orders-of-magnitude fewer student trajectories.
arXiv Detail & Related papers (2025-06-20T13:21:14Z)
Chain of Correction for Full-text Speech Recognition with Large Language Models [21.37485126269991]
Chain of Correction (CoC) for full-text error correction with Large Language Models (LLMs) CoC corrects errors segment by segment using pre-recognized text as guidance within a regular multi-turn chat format. We analyze how to set the correction threshold to balance under-correction and over-rephrasing.
arXiv Detail & Related papers (2025-04-02T09:06:23Z)
Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors [11.07539342949602]
We propose an end-to-end framework for detecting factual errors in text summarization. Our framework uses a diverse set of LLM prompts to identify factual inconsistencies. We calibrate the ensembled models to produce empirically accurate probabilities that a text is factually consistent or free of hallucination.
arXiv Detail & Related papers (2024-06-18T18:59:37Z)
CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation [76.31621715032558]
Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses. We introduce CaLM, a novel verification framework. Our framework empowers smaller LMs, which rely less on parametric memory, to validate the output of larger LMs.
arXiv Detail & Related papers (2024-06-08T06:04:55Z)
Language Models with Conformal Factuality Guarantees [44.767328168194815]
Conformal factuality is a framework that can ensure high probability correctness guarantees for language model (LM) outputs. We show that conformal prediction in language models corresponds to a back-off algorithm that provides high probability correctness guarantees.
arXiv Detail & Related papers (2024-02-15T18:31:53Z)
Small Language Model Can Self-correct [42.76612128849389]
We introduce the underlineIntrinsic underlineSelf-underlineCorrection (ISC) in generative language models, aiming to correct the initial output of LMs in a self-triggered manner. We conduct experiments using LMs with parameters sizes ranging from 6 billion to 13 billion in two tasks, including commonsense reasoning and factual knowledge reasoning.
arXiv Detail & Related papers (2024-01-14T14:29:07Z)
Knowledge-Augmented Language Model Verification [68.6099592486075]
Recent Language Models (LMs) have shown impressive capabilities in generating texts with the knowledge internalized in parameters. We propose to verify the output and the knowledge of the knowledge-augmented LMs with a separate verifier. Our results show that the proposed verifier effectively identifies retrieval and generation errors, allowing LMs to provide more factually correct outputs.
arXiv Detail & Related papers (2023-10-19T15:40:00Z)
Optimizing Factual Accuracy in Text Generation through Dynamic Knowledge Selection [71.20871905457174]
Language models (LMs) have revolutionized the way we interact with information, but they often generate nonfactual text. Previous methods use external knowledge as references for text generation to enhance factuality but often struggle with the knowledge mix-up of irrelevant references. We present DKGen, which divide the text generation process into an iterative process.
arXiv Detail & Related papers (2023-08-30T02:22:40Z)
LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback. Our focus is the code generation task, where the model produces code based on natural language instructions. LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z)
Factuality Enhanced Language Models for Open-Ended Text Generation [60.27166549575472]
We design the FactualityPrompts test set and metrics to measure the factuality of LM generations. We find that larger LMs are more factual than smaller ones, although a previous study suggests that larger LMs can be less truthful in terms of misconceptions. We propose a factuality-enhanced training method that uses TopicPrefix for better awareness of facts and sentence completion.
arXiv Detail & Related papers (2022-06-09T17:16:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.