Related papers: Decomposing and Revising What Language Models Generate

Decomposing and Revising What Language Models Generate

URL: http://arxiv.org/abs/2509.00765v1
Date: Sun, 31 Aug 2025 09:26:25 GMT
Title: Decomposing and Revising What Language Models Generate
Authors: Zhichao Yan, Jiaoyan Chen, Jiapu Wang, Xiaoli Li, Ru Li, Jeff Z. Pan,
Abstract summary: We propose a new fact decomposition-based framework called FIDES for attributed QA.<n>Fides uses a contextually enhanced two-stage faithful decomposition method to decompose long form answers into sub-facts.<n>If the retrieved evidence snippets conflict with the related sub-facts, such sub-facts will be revised accordingly.<n>Fides outperforms the SOTA methods by over 14% in average with GPT-3.5-turbo, Gemini and Llama 70B series.
Score: 29.67882325906939
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Attribution is crucial in question answering (QA) with Large Language Models (LLMs).SOTA question decomposition-based approaches use long form answers to generate questions for retrieving related documents. However, the generated questions are often irrelevant and incomplete, resulting in a loss of facts in retrieval.These approaches also fail to aggregate evidence snippets from different documents and paragraphs. To tackle these problems, we propose a new fact decomposition-based framework called FIDES (\textit{faithful context enhanced fact decomposition and evidence aggregation}) for attributed QA. FIDES uses a contextually enhanced two-stage faithful decomposition method to decompose long form answers into sub-facts, which are then used by a retriever to retrieve related evidence snippets. If the retrieved evidence snippets conflict with the related sub-facts, such sub-facts will be revised accordingly. Finally, the evidence snippets are aggregated according to the original sentences.Extensive evaluation has been conducted with six datasets, with an additionally proposed new metric called $Attr_{auto-P}$ for evaluating the evidence precision. FIDES outperforms the SOTA methods by over 14\% in average with GPT-3.5-turbo, Gemini and Llama 70B series.

Related papers

Hybrid Retrieval-Augmented Generation for Robust Multilingual Document Question Answering [0.3376269351435395]
Large-scale digitization initiatives have unlocked massive collections of historical newspapers.<n>We develop and evaluate a multilingual Retrieval-Augmented Generation pipeline specifically designed for question answering on noisy historical documents.
arXiv Detail & Related papers (2025-12-14T13:57:05Z)
Decomposition-Enhanced Training for Post-Hoc Attributions In Language Models [64.49342399229529]
We argue that post-hoc attribution can be reframed as a reasoning problem, where answers are decomposed into constituent units, each tied to specific context.<n>We introduce DecompTune, a post-training method that teaches models to produce answer decompositions as intermediate reasoning steps.<n>Across extensive experiments and ablations, DecompTune substantially improves attribution quality, outperforming prior methods and matching or exceeding state-of-the-art frontier models.
arXiv Detail & Related papers (2025-10-29T17:58:59Z)
VeriCite: Towards Reliable Citations in Retrieval-Augmented Generation via Rigorous Verification [107.75781898355562]
We introduce a novel framework, called VeriCite, designed to rigorously validate supporting evidence and enhance answer attribution.<n>We conduct experiments across five open-source LLMs and four datasets, demonstrating that VeriCite can significantly improve citation quality while maintaining the correctness of the answers.
arXiv Detail & Related papers (2025-10-13T13:38:54Z)
Question Decomposition for Retrieval-Augmented Generation [2.6409776648054764]
We propose a RAG pipeline that incorporates question decomposition into sub-questions.<n>We show that question decomposition effectively assembles complementary documents, while reranking reduces noise.<n>Although reranking itself is standard, we show that pairing an off-the-shelf cross-encoder reranker with LLM-driven question decomposition bridges the retrieval gap on multi-hop questions.
arXiv Detail & Related papers (2025-07-01T01:01:54Z)
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models [53.17363502535395]
Trustworthy language models should provide both correct and verifiable answers.<n>Current systems insert citations by querying an external retriever at inference time.<n>We propose Active Indexing, which continually pretrains on synthetic QA pairs.
arXiv Detail & Related papers (2025-06-21T04:48:05Z)
Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation [108.13261761812517]
We introduce FRANQ (Faithfulness-based Retrieval Augmented UNcertainty Quantification), a novel method for hallucination detection in RAG outputs.<n>We present a new long-form Question Answering (QA) dataset annotated for both factuality and faithfulness.
arXiv Detail & Related papers (2025-05-27T11:56:59Z)
Evidence Contextualization and Counterfactual Attribution for Conversational QA over Heterogeneous Data with RAG Systems [4.143039012104666]
Retrieval Augmented Generation (RAG) works as a backbone for interacting with an enterprise's own data via Conversational Question Answering (ConvQA)<n>In this work, we demonstrate RAGONITE, a RAG system that remedies the above concerns by: (i) contextualizing evidence with source metadata and surrounding text; and (ii) computing counterfactual attribution.
arXiv Detail & Related papers (2024-12-13T21:28:17Z)
Atomic Fact Decomposition Helps Attributed Question Answering [30.75332718824254]
Attributed Question Answering (AQA) aims to provide both a trustworthy answer and a reliable attribution report for a question. This paper proposes an Atomic fact decomposition-based Retrieval and Editing framework. It decomposes the generated long-form answers into molecular clauses and atomic facts by the instruction-tuned LLMs.
arXiv Detail & Related papers (2024-10-22T05:25:54Z)
ELOQ: Resources for Enhancing LLM Detection of Out-of-Scope Questions [52.33835101586687]
We study out-of-scope questions, where the retrieved document appears semantically similar to the question but lacks the necessary information to answer it.<n>We propose a guided hallucination-based approach ELOQ to automatically generate a diverse set of out-of-scope questions from post-cutoff documents.
arXiv Detail & Related papers (2024-10-18T16:11:29Z)
Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims. We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents. We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question. Most open QA systems have considered only retrieving information from unstructured text. We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.