FactCorrector: A Graph-Inspired Approach to Long-Form Factuality Correction of Large Language Models
- URL: http://arxiv.org/abs/2601.11232v1
- Date: Fri, 16 Jan 2026 12:23:58 GMT
- Title: FactCorrector: A Graph-Inspired Approach to Long-Form Factuality Correction of Large Language Models
- Authors: Javier Carnerero-Cano, Massimiliano Pronesti, Radu Marinescu, Tigran Tchrakian, James Barry, Jasmina Gajcin, Yufang Hou, Alessandra Pascale, Elizabeth Daly,
- Abstract summary: Large language models (LLMs) are widely used in knowledge-intensive applications but often generate factually incorrect responses.<n>We introduce FactCorrector, a new post-hoc correction method that adapts across domains without retraining.<n> Experiments on VELI5 and several popular long-form factuality datasets show that the FactCorrector approach significantly improves factual precision while preserving relevance.
- Score: 47.782867391739195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are widely used in knowledge-intensive applications but often generate factually incorrect responses. A promising approach to rectify these flaws is correcting LLMs using feedback. Therefore, in this paper, we introduce FactCorrector, a new post-hoc correction method that adapts across domains without retraining and leverages structured feedback about the factuality of the original response to generate a correction. To support rigorous evaluations of factuality correction methods, we also develop the VELI5 benchmark, a novel dataset containing systematically injected factual errors and ground-truth corrections. Experiments on VELI5 and several popular long-form factuality datasets show that the FactCorrector approach significantly improves factual precision while preserving relevance, outperforming strong baselines. We release our code at https://ibm.biz/factcorrector.
Related papers
- PrefixNLI: Detecting Factual Inconsistencies as Soon as They Arise [60.63315470285562]
MiniTruePrefixes is a novel specialized model that better detects factual inconsistencies over text prefixes.<n>We show that integrating MiniTruePrefixes into a controlled decoding framework substantially improves factual consistency in abstractive summarization.
arXiv Detail & Related papers (2025-11-03T09:07:44Z) - Graph-based Confidence Calibration for Large Language Models [22.394717844099684]
We propose using an auxiliary learning model to assess response correctness based on the self-consistency of multiple outputs generated by the large language models.<n>Our method builds a consistency graph to represent the agreement among multiple responses and uses a graph neural network (GNN) to estimate the likelihood that each response is correct.
arXiv Detail & Related papers (2024-11-03T20:36:44Z) - FactAlign: Long-form Factuality Alignment of Large Language Models [35.067998820937284]
Large language models have demonstrated significant potential as the next-generation information access engines.
We propose FactAlign, a novel alignment framework designed to enhance the factuality of long-form responses.
Our experiments on open-domain prompts and information-seeking questions demonstrate that FactAlign significantly improves the factual accuracy of LLM responses.
arXiv Detail & Related papers (2024-10-02T16:03:13Z) - Training Language Models to Self-Correct via Reinforcement Learning [98.35197671595343]
Self-correction has been found to be largely ineffective in modern large language models (LLMs)
We develop a multi-turn online reinforcement learning approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data.
We find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on MATH and HumanEval.
arXiv Detail & Related papers (2024-09-19T17:16:21Z) - Tag and correct: high precision post-editing approach to correction of speech recognition errors [0.0]
It consists of using a neural sequence tagger that learns how to correct an ASR (Automatic Speech Recognition) hypothesis word by word and a corrector module that applies corrections returned by the tagger.
The proposed solution is applicable to any ASR system, regardless of its architecture, and provides high-precision control over errors being corrected.
arXiv Detail & Related papers (2024-06-11T09:52:33Z) - Small Language Models Need Strong Verifiers to Self-Correct Reasoning [69.94251699982388]
Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs)
This work explores whether small (= 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs.
arXiv Detail & Related papers (2024-04-26T03:41:28Z) - Calibrating Long-form Generations from Large Language Models [34.72041258464477]
Large Language Models' (LLMs) confidence scores should align with the actual likelihood of its responses being correct.
Current confidence elicitation methods and calibration metrics rely on a binary true/false assessment of response correctness.
We introduce a unified calibration framework, in which both the correctness of the LLMs' responses and their associated confidence levels are treated as distributions across a range of scores.
arXiv Detail & Related papers (2024-02-09T17:00:32Z) - Lyra: Orchestrating Dual Correction in Automated Theorem Proving [63.115422781158934]
Lyra is a new framework that employs two distinct correction mechanisms: Tool Correction and Conjecture Correction.
Tool Correction contributes to mitigating hallucinations, thereby improving the overall accuracy of the proof.
Conjecture Correction refines generation with instruction but does not collect paired (generation, error & refinement) prompts.
arXiv Detail & Related papers (2023-09-27T17:29:41Z) - Converge to the Truth: Factual Error Correction via Iterative
Constrained Editing [30.740281040892086]
We propose VENCE, a novel method for factual error correction (FEC) with minimal edits.
VENCE formulates the FEC problem as iterative sampling editing actions with respect to a target density function.
Experiments on a public dataset show that VENCE improves the well-adopted SARI metric by 5.3 (or a relative improvement of 11.8%) over the previous best distantly-supervised methods.
arXiv Detail & Related papers (2022-11-22T10:03:13Z) - Factual Error Correction for Abstractive Summaries Using Entity
Retrieval [57.01193722520597]
We propose an efficient factual error correction system RFEC based on entities retrieval post-editing process.
RFEC retrieves the evidence sentences from the original document by comparing the sentences with the target summary.
Next, RFEC detects the entity-level errors in the summaries by considering the evidence sentences and substitutes the wrong entities with the accurate entities from the evidence sentences.
arXiv Detail & Related papers (2022-04-18T11:35:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.