Related papers: Atomic Fact Decomposition Helps Attributed Question Answering

Atomic Fact Decomposition Helps Attributed Question Answering

URL: http://arxiv.org/abs/2410.16708v1
Date: Tue, 22 Oct 2024 05:25:54 GMT
Title: Atomic Fact Decomposition Helps Attributed Question Answering
Authors: Zhichao Yan, Jiapu Wang, Jiaoyan Chen, Xiaoli Li, Ru Li, Jeff Z. Pan,
Abstract summary: Attributed Question Answering (AQA) aims to provide both a trustworthy answer and a reliable attribution report for a question. This paper proposes an Atomic fact decomposition-based Retrieval and Editing framework. It decomposes the generated long-form answers into molecular clauses and atomic facts by the instruction-tuned LLMs.
Score: 30.75332718824254
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Attributed Question Answering (AQA) aims to provide both a trustworthy answer and a reliable attribution report for a given question. Retrieval is a widely adopted approach, including two general paradigms: Retrieval-Then-Read (RTR) and post-hoc retrieval. Recently, Large Language Models (LLMs) have shown remarkable proficiency, prompting growing interest in AQA among researchers. However, RTR-based AQA often suffers from irrelevant knowledge and rapidly changing information, even when LLMs are adopted, while post-hoc retrieval-based AQA struggles with comprehending long-form answers with complex logic, and precisely identifying the content needing revision and preserving the original intent. To tackle these problems, this paper proposes an Atomic fact decomposition-based Retrieval and Editing (ARE) framework, which decomposes the generated long-form answers into molecular clauses and atomic facts by the instruction-tuned LLMs. Notably, the instruction-tuned LLMs are fine-tuned using a well-constructed dataset, generated from large scale Knowledge Graphs (KGs). This process involves extracting one-hop neighbors from a given set of entities and transforming the result into coherent long-form text. Subsequently, ARE leverages a search engine to retrieve evidences related to atomic facts, inputting these evidences into an LLM-based verifier to determine whether the facts require expansion for re-retrieval or editing. Furthermore, the edited facts are backtracked into the original answer, with evidence aggregated based on the relationship between molecular clauses and atomic facts. Extensive evaluations demonstrate the superior performance of our proposed method over the state-of-the-arts on several datasets, with an additionally proposed new metric $Attr_{p}$ for evaluating the precision of evidence attribution.

Related papers

ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering [54.72902502486611]
ReAG is a Reasoning-Augmented Multimodal RAG approach that combines coarse- and fine-grained retrieval with a critic model that filters irrelevant passages.<n>ReAG significantly outperforms prior methods, improving answer accuracy and providing interpretable reasoning grounded in retrieved evidence.
arXiv Detail & Related papers (2025-11-27T19:01:02Z)
FAIR-RAG: Faithful Adaptive Iterative Refinement for Retrieval-Augmented Generation [0.0]
We introduce FAIR-RAG, a novel agentic framework that transforms the standard RAG pipeline into a dynamic, evidence-driven reasoning process.<n>We conduct experiments on challenging multi-hop QA benchmarks, including HotpotQA, 2WikiMultiHopQA, and MusiQue.<n>Our work demonstrates that a structured, evidence-driven refinement process with explicit gap analysis is crucial for unlocking reliable and accurate reasoning in advanced RAG systems.
arXiv Detail & Related papers (2025-10-25T15:59:33Z)
Reasoning-enhanced Query Understanding through Decomposition and Interpretation [87.56450566014625]
ReDI is a Reasoning-enhanced approach for query understanding through Decomposition and Interpretation.<n>We compiled a large-scale dataset of real-world complex queries from a major search engine.<n> Experiments on BRIGHT and BEIR demonstrate that ReDI consistently surpasses strong baselines in both sparse and dense retrieval paradigms.
arXiv Detail & Related papers (2025-09-08T10:58:42Z)
Decomposing and Revising What Language Models Generate [29.67882325906939]
We propose a new fact decomposition-based framework called FIDES for attributed QA.<n>Fides uses a contextually enhanced two-stage faithful decomposition method to decompose long form answers into sub-facts.<n>If the retrieved evidence snippets conflict with the related sub-facts, such sub-facts will be revised accordingly.<n>Fides outperforms the SOTA methods by over 14% in average with GPT-3.5-turbo, Gemini and Llama 70B series.
arXiv Detail & Related papers (2025-08-31T09:26:25Z)
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z)
RelationalFactQA: A Benchmark for Evaluating Tabular Fact Retrieval from Large Language Models [9.211266032947497]
We demonstrate that fact retrieval is substantially more difficult than isolated point-wise queries.<n>Our experiments reveal that even stateofthe-art LLMs struggle significantly, not exceeding 25% factual accuracy.<n>These findings underscore limitations in current LLMs' ability to synthesize structured factual knowledge.
arXiv Detail & Related papers (2025-05-27T16:33:38Z)
Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation [108.13261761812517]
We introduce FRANQ (Faithfulness-based Retrieval Augmented UNcertainty Quantification), a novel method for hallucination detection in RAG outputs.<n>We present a new long-form Question Answering (QA) dataset annotated for both factuality and faithfulness.
arXiv Detail & Related papers (2025-05-27T11:56:59Z)
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs [25.800565994304847]
Large language models have demonstrated impressive reasoning capabilities but are inherently limited by their knowledge reservoir.<n>Retrieval-augmented reasoning mitigates this limitation by allowing LLMs to query external resources.<n>We propose AutoRefine, a reinforcement learning framework that adopts a new search-and-refine-during-think'' paradigm.
arXiv Detail & Related papers (2025-05-16T14:11:29Z)
FactReasoner: A Probabilistic Approach to Long-Form Factuality Assessment for Large Language Models [59.171510592986735]
We propose FactReasoner, a new factuality assessor that relies on probabilistic reasoning to assess the factuality of a long-form generated response. Our experiments on labeled and unlabeled benchmark datasets demonstrate clearly that FactReasoner improves considerably over state-of-the-art prompt-based approaches.
arXiv Detail & Related papers (2025-02-25T19:01:48Z)
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework. This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings. Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z)
MRAG: A Modular Retrieval Framework for Time-Sensitive Question Answering [3.117448929160824]
temporal relations and answering time-sensitive questions is a challenging task for question-answering systems powered by large language models (LLMs) We introduce the TempRAGEval benchmark, which repurposes existing datasets by incorporating temporal perturbations and gold evidence labels. On TempRAGEval, MRAG significantly outperforms baseline retrievers in retrieval performance, leading to further improvements in final answer accuracy.
arXiv Detail & Related papers (2024-12-20T03:58:27Z)
A Claim Decomposition Benchmark for Long-form Answer Verification [42.27949634354242]
We introduce a new claim decomposition benchmark, which requires building system that can identify atomic and checkworthy claims for LLM responses. The CACDD encompasses a collection of 500 human-annotated question-answer pairs, including a total of 4956 atomic claims. Results show that the claim decomposition is highly challenging and requires further explorations.
arXiv Detail & Related papers (2024-10-16T13:34:51Z)
Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation [72.70046559930555]
We propose a generic RAG approach called Adaptive Note-Enhanced RAG (Adaptive-Note) for complex QA tasks. Specifically, Adaptive-Note introduces an overarching view of knowledge growth, iteratively gathering new information in the form of notes. In addition, we employ an adaptive, note-based stop-exploration strategy to decide "what to retrieve and when to stop" to encourage sufficient knowledge exploration.
arXiv Detail & Related papers (2024-10-11T14:03:29Z)
CoTKR: Chain-of-Thought Enhanced Knowledge Rewriting for Complex Knowledge Graph Question Answering [33.89497991289916]
We propose a novel rewriting method CoTKR, Chain-of-Thought Enhanced Knowledge Rewriting, for generating reasoning traces and corresponding knowledge in an interleaved manner. We conduct experiments using various Large Language Models (LLMs) across several Knowledge Graph Question Answering (KGQA) benchmarks.
arXiv Detail & Related papers (2024-09-29T16:08:45Z)
Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation [51.8188846284153]
RAG has been widely adopted to enhance Large Language Models (LLMs) Attributed Text Generation (ATG) has attracted growing attention, which provides citations to support the model's responses in RAG. This paper proposes a fine-grained ATG method called ReClaim(Refer & Claim), which alternates the generation of references and answers step by step.
arXiv Detail & Related papers (2024-07-01T20:47:47Z)
Optimization of Retrieval-Augmented Generation Context with Outlier Detection [0.0]
We focus on methods to reduce the size and improve the quality of the prompt context required for question-answering systems. Our goal is to select the most semantically relevant documents, treating the discarded ones as outliers. It was found that the greatest improvements were achieved with increasing complexity of the questions and answers.
arXiv Detail & Related papers (2024-07-01T15:53:29Z)
SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs [85.54906813106683]
We propose a simple yet effective framework to enhance open-domain question answering (ODQA) with large language models (LLMs) SuRe helps LLMs predict more accurate answers for a given question, which are well-supported by the summarized retrieval (SuRe) Experimental results on diverse ODQA benchmarks demonstrate the superiority of SuRe, with improvements of up to 4.6% in exact match (EM) and 4.0% in F1 score over standard prompting approaches.
arXiv Detail & Related papers (2024-04-17T01:15:54Z)
RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation [42.82192656794179]
Large Language Models (LLMs) exhibit remarkable capabilities but are prone to generating inaccurate or hallucinatory responses. This limitation stems from their reliance on vast pretraining datasets, making them susceptible to errors in unseen scenarios. Retrieval-Augmented Generation (RAG) addresses this by incorporating external, relevant documents into the response generation process.
arXiv Detail & Related papers (2024-03-31T08:58:54Z)
Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning [73.51314109184197]
It is crucial for large language models (LLMs) to understand the concept of temporal knowledge. We propose a complex temporal question-answering dataset Complex-TR that focuses on multi-answer and multi-hop temporal reasoning.
arXiv Detail & Related papers (2023-11-16T11:49:29Z)
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection [74.51523859064802]
We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) Self-RAG enhances an LM's quality and factuality through retrieval and self-reflection. It significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks.
arXiv Detail & Related papers (2023-10-17T18:18:32Z)
DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases [81.19499764899359]
We propose a novel framework DecAF that jointly generates both logical forms and direct answers. DecAF achieves new state-of-the-art accuracy on WebQSP, FreebaseQA, and GrailQA benchmarks.
arXiv Detail & Related papers (2022-09-30T19:51:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.