Related papers: PrefixNLI: Detecting Factual Inconsistencies as Soon as They Arise

PrefixNLI: Detecting Factual Inconsistencies as Soon as They Arise

URL: http://arxiv.org/abs/2511.01359v1
Date: Mon, 03 Nov 2025 09:07:44 GMT
Title: PrefixNLI: Detecting Factual Inconsistencies as Soon as They Arise
Authors: Sapir Harary, Eran Hirsch, Aviv Slobodkin, David Wan, Mohit Bansal, Ido Dagan,
Abstract summary: MiniTruePrefixes is a novel specialized model that better detects factual inconsistencies over text prefixes.<n>We show that integrating MiniTruePrefixes into a controlled decoding framework substantially improves factual consistency in abstractive summarization.
Score: 60.63315470285562
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Natural Language Inference (NLI) models have been used in various ways to improve the factuality of LLM outputs. This is typically done by applying an NLI model to judge whether the model output is entailed from the supposed evidence, triggering some corrective actions, such as beam reranking at inference time or RL rewards during training. While NLI models are trained to detect factual inconsistencies over complete sentences, decisions in the common autoregressive generation architecture are made for each evolving text prefix, during decoding. Addressing this setting, we generalize the entailment detection task to apply over arbitrary text prefixes, and suggest its utility for improving generation faithfulness. Providing suitable evaluation and training datasets for this task, we train MiniTruePrefixes, a novel specialized model that better detects factual inconsistencies over text prefixes, outperforming comparable baseline NLI models by 5-14 F1 points in prefix-level entailment. We further demonstrate that integrating MiniTruePrefixes into a controlled decoding framework substantially improves factual consistency in abstractive summarization. When guided by MiniTruePrefixes, LLaMA-3.2-3B-Instruct matches the faithfulness and runtime of the 8B model from the same model family, while using only half the memory.

Related papers

Learning an Image Editing Model without Image Editing Pairs [83.03646586929638]
Recent image editing models have achieved impressive results while following natural language editing instructions.<n>They rely on supervised fine-tuning with large datasets of input-target pairs.<n>Current workarounds use synthetic training pairs that leverage the zero-shot capabilities of existing models.<n>We present a new training paradigm that eliminates the need for paired data entirely.
arXiv Detail & Related papers (2025-10-16T17:59:57Z)
FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data [13.108807408880645]
We propose a novel approach for synthetic data generation, CG2C, that leverages multi-hop reasoning on context graphs extracted from documents.<n>Our fact checker model, FactCG, demonstrates improved performance with more connected reasoning, using the same backbone models.
arXiv Detail & Related papers (2025-01-28T18:45:07Z)
Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors [11.07539342949602]
We propose an end-to-end framework for detecting factual errors in text summarization. Our framework uses a diverse set of LLM prompts to identify factual inconsistencies. We calibrate the ensembled models to produce empirically accurate probabilities that a text is factually consistent or free of hallucination.
arXiv Detail & Related papers (2024-06-18T18:59:37Z)
A synthetic data approach for domain generalization of NLI models [13.840374911669167]
Natural Language Inference (NLI) remains an important benchmark task for LLMs. We show that synthetic high-quality datasets can adapt NLI models for zero-shot use in downstream applications. We show that models trained on this data have the best generalization to completely new downstream test settings.
arXiv Detail & Related papers (2024-02-19T18:55:16Z)
Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations' In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z)
Thutmose Tagger: Single-pass neural model for Inverse Text Normalization [76.87664008338317]
Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition. We present a dataset preparation method based on the granular alignment of ITN examples. One-to-one correspondence between tags and input words improves the interpretability of the model's predictions.
arXiv Detail & Related papers (2022-07-29T20:39:02Z)
Logical Reasoning with Span Predictions: Span-level Logical Atoms for Interpretable and Robust NLI Models [19.601700560645206]
Current Natural Language Inference (NLI) models achieve impressive results, sometimes outperforming humans on in-distribution test sets. We introduce a logical reasoning framework for NLI, creating highly transparent model decisions that are based on logical rules. We almost fully retain performance on SNLI while identifying the exact hypothesis spans that are responsible for each model prediction.
arXiv Detail & Related papers (2022-05-23T16:24:27Z)
Falsesum: Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization [63.21819285337555]
We show that NLI models can be effective for this task when the training data is augmented with high-quality task-oriented examples. We introduce Falsesum, a data generation pipeline leveraging a controllable text generation model to perturb human-annotated summaries. We show that models trained on a Falsesum-augmented NLI dataset improve the state-of-the-art performance across four benchmarks for detecting factual inconsistency in summarization.
arXiv Detail & Related papers (2022-05-12T10:43:42Z)
Stretching Sentence-pair NLI Models to Reason over Long Documents and Clusters [35.103851212995046]
Natural Language Inference (NLI) has been extensively studied by the NLP community as a framework for estimating the semantic relation between sentence pairs. We explore the direct zero-shot applicability of NLI models to real applications, beyond the sentence-pair setting they were trained on. We develop new aggregation methods to allow operating over full documents, reaching state-of-the-art performance on the ContractNLI dataset.
arXiv Detail & Related papers (2022-04-15T12:56:39Z)
Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes. An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences. The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.