Decomposition-Enhanced Training for Post-Hoc Attributions In Language Models
- URL: http://arxiv.org/abs/2510.25766v3
- Date: Wed, 05 Nov 2025 22:09:36 GMT
- Title: Decomposition-Enhanced Training for Post-Hoc Attributions In Language Models
- Authors: Sriram Balasubramanian, Samyadeep Basu, Koustava Goswami, Ryan Rossi, Varun Manjunatha, Roshan Santhosh, Ruiyi Zhang, Soheil Feizi, Nedim Lipka,
- Abstract summary: We argue that post-hoc attribution can be reframed as a reasoning problem, where answers are decomposed into constituent units, each tied to specific context.<n>We introduce DecompTune, a post-training method that teaches models to produce answer decompositions as intermediate reasoning steps.<n>Across extensive experiments and ablations, DecompTune substantially improves attribution quality, outperforming prior methods and matching or exceeding state-of-the-art frontier models.
- Score: 64.49342399229529
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models (LLMs) are increasingly used for long-document question answering, where reliable attribution to sources is critical for trust. Existing post-hoc attribution methods work well for extractive QA but struggle in multi-hop, abstractive, and semi-extractive settings, where answers synthesize information across passages. To address these challenges, we argue that post-hoc attribution can be reframed as a reasoning problem, where answers are decomposed into constituent units, each tied to specific context. We first show that prompting models to generate such decompositions alongside attributions improves performance. Building on this, we introduce DecompTune, a post-training method that teaches models to produce answer decompositions as intermediate reasoning steps. We curate a diverse dataset of complex QA tasks, annotated with decompositions by a strong LLM, and post-train Qwen-2.5 (7B and 14B) using a two-stage SFT + GRPO pipeline with task-specific curated rewards. Across extensive experiments and ablations, DecompTune substantially improves attribution quality, outperforming prior methods and matching or exceeding state-of-the-art frontier models.
Related papers
- ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering [54.72902502486611]
ReAG is a Reasoning-Augmented Multimodal RAG approach that combines coarse- and fine-grained retrieval with a critic model that filters irrelevant passages.<n>ReAG significantly outperforms prior methods, improving answer accuracy and providing interpretable reasoning grounded in retrieved evidence.
arXiv Detail & Related papers (2025-11-27T19:01:02Z) - BMGQ: A Bottom-up Method for Generating Complex Multi-hop Reasoning Questions from Semi-structured Data [8.52473384574856]
We present an automated framework for generating high-difficulty, training-ready multi-hop questions from semi-structured knowledge sources.<n>The system grows diverse, logically labeled evidence clusters through Natural Language Inference (NLI)-based relation typing and diversity-aware expansion.
arXiv Detail & Related papers (2025-10-28T07:43:15Z) - Decomposing and Revising What Language Models Generate [29.67882325906939]
We propose a new fact decomposition-based framework called FIDES for attributed QA.<n>Fides uses a contextually enhanced two-stage faithful decomposition method to decompose long form answers into sub-facts.<n>If the retrieved evidence snippets conflict with the related sub-facts, such sub-facts will be revised accordingly.<n>Fides outperforms the SOTA methods by over 14% in average with GPT-3.5-turbo, Gemini and Llama 70B series.
arXiv Detail & Related papers (2025-08-31T09:26:25Z) - Context Attribution with Multi-Armed Bandit Optimization [11.715006981206844]
We propose a novel framework that formulates context attribution as a multi-armed bandit (CMAB) problem.<n>We employ Combinatorial Thompson Sampling (CTS) to efficiently explore the exponentially large space of context subsets under a limited query budget.<n>Our method defines a reward function based on normalized token likelihoods, capturing how well a subset of segments supports the original model response.
arXiv Detail & Related papers (2025-06-24T19:47:27Z) - QA-prompting: Improving Summarization with Large Language Models using Question-Answering [0.8460698440162888]
Language Models (LMs) have revolutionized natural language processing, enabling high-quality text generation through prompting and in-context learning.<n>We propose QA-prompting - a simple prompting method for summarization that utilizes question-answering as an intermediate step prior to summary generation.<n>Our method extracts key information and enriches the context of text to mitigate positional biases and improve summarization in a single LM call per task without requiring fine-tuning or pipelining.
arXiv Detail & Related papers (2025-05-20T13:29:36Z) - Reinforcing Compositional Retrieval: Retrieving Step-by-Step for Composing Informative Contexts [67.67746334493302]
Large Language Models (LLMs) have demonstrated remarkable capabilities across numerous tasks, yet they often rely on external context to handle complex tasks.<n>We propose a tri-encoder sequential retriever that models this process as a Markov Decision Process (MDP)<n>We show that our method consistently and significantly outperforms baselines, underscoring the importance of explicitly modeling inter-example dependencies.
arXiv Detail & Related papers (2025-04-15T17:35:56Z) - SEMQA: Semi-Extractive Multi-Source Question Answering [94.04430035121136]
We introduce a new QA task for answering multi-answer questions by summarizing multiple diverse sources in a semi-extractive fashion.
We create the first dataset of this kind, QuoteSum, with human-written semi-extractive answers to natural and generated questions.
arXiv Detail & Related papers (2023-11-08T18:46:32Z) - Jaeger: A Concatenation-Based Multi-Transformer VQA Model [0.13654846342364307]
Document-based Visual Question Answering poses a challenging task between linguistic sense disambiguation and fine-grained multimodal retrieval.
We propose Jaegar, a concatenation-based multi-transformer VQA model.
Our approach has the potential to amplify the performance of these models through concatenation.
arXiv Detail & Related papers (2023-10-11T00:14:40Z) - Tokenization Consistency Matters for Generative Models on Extractive NLP
Tasks [54.306234256074255]
We identify the issue of tokenization inconsistency that is commonly neglected in training generative models.
This issue damages the extractive nature of these tasks after the input and output are tokenized inconsistently.
We show that, with consistent tokenization, the model performs better in both in-domain and out-of-domain datasets.
arXiv Detail & Related papers (2022-12-19T23:33:21Z) - Self-Prompting Large Language Models for Zero-Shot Open-Domain QA [67.08732962244301]
Open-Domain Question Answering (ODQA) aims to answer questions without explicitly providing background documents.
This task becomes notably challenging in a zero-shot setting where no data is available to train tailored retrieval-reader models.
We propose a Self-Prompting framework to explicitly utilize the massive knowledge encoded in the parameters of Large Language Models.
arXiv Detail & Related papers (2022-12-16T18:23:43Z) - Successive Prompting for Decomposing Complex Questions [50.00659445976735]
Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting.
We introduce Successive Prompting'', where we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution.
Our best model (with successive prompting) achieves an improvement of 5% absolute F1 on a few-shot version of the DROP dataset.
arXiv Detail & Related papers (2022-12-08T06:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.