Related papers: LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data

LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data

URL: http://arxiv.org/abs/2502.12583v1
Date: Tue, 18 Feb 2025 06:40:23 GMT
Title: LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data
Authors: Cehao Yang, Xueyuan Lin, Chengjin Xu, Xuhui Jiang, Shengjie Ma, Aofan Liu, Hui Xiong, Jian Guo,
Abstract summary: LongFaith is a novel pipeline for synthesizing faithful long-context reasoning instruction datasets.<n>By integrating ground truth and citation-based reasoning prompts, we eliminate distractions and improve the accuracy of reasoning chains.
Score: 19.79929012055293
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Despite the growing development of long-context large language models (LLMs), data-centric approaches relying on synthetic data have been hindered by issues related to faithfulness, which limit their effectiveness in enhancing model performance on tasks such as long-context reasoning and question answering (QA). These challenges are often exacerbated by misinformation caused by lack of verification, reasoning without attribution, and potential knowledge conflicts. We propose LongFaith, a novel pipeline for synthesizing faithful long-context reasoning instruction datasets. By integrating ground truth and citation-based reasoning prompts, we eliminate distractions and improve the accuracy of reasoning chains, thus mitigating the need for costly verification processes. We open-source two synthesized datasets, LongFaith-SFT and LongFaith-PO, which systematically address multiple dimensions of faithfulness, including verified reasoning, attribution, and contextual grounding. Extensive experiments on multi-hop reasoning datasets and LongBench demonstrate that models fine-tuned on these datasets significantly improve performance. Our ablation studies highlight the scalability and adaptability of the LongFaith pipeline, showcasing its broad applicability in developing long-context LLMs.

Related papers

Beyond Isolated Capabilities: Bridging Long CoT Reasoning and Long-Context Understanding [16.50502775216771]
Reasoning distillation has emerged as an effective approach to enhance the reasoning capabilities of smaller language models.<n>The impact of large-scale reasoning distillation on other critical abilities, particularly in-context retrieval and reasoning, remains unexplored.
arXiv Detail & Related papers (2025-07-20T07:43:16Z)
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning [80.26953590563232]
We formalize the paradigm of long-context reasoning RL, and identify key challenges in suboptimal training efficiency and unstable optimization process.<n>We propose QwenLong-L1, a framework that adapts short-context LRMs to long-context scenarios via progressive context scaling.<n> Experiments on seven long-context document question-answering benchmarks demonstrate that QwenLong-L1-32B outperforms flagship LRMs like OpenAI-o3-mini and Qwen3-235B-A22B.
arXiv Detail & Related papers (2025-05-23T09:31:55Z)
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning [23.301029291780317]
We examine whether enhancing a model's long-context ability beforeSupervised Fine-Tuning (SFT) leads to improved reasoning performance.<n>Our results reveal a consistent trend: models with stronger long-context capacity achieve significantly higher accuracy on reasoning benchmarks after SFT.<n>These findings suggest that long-context modeling is not just essential for processing lengthy inputs, but also serves as a critical foundation for reasoning.
arXiv Detail & Related papers (2025-05-22T22:09:47Z)
WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale [86.25450054683172]
WildLong extracts meta-information from real user queries to produce scalable data. It supports multi-document reasoning, such as cross-document comparison and aggregation. It surpasses existing open-source long-context-optimized models across benchmarks.
arXiv Detail & Related papers (2025-02-23T18:59:09Z)
When More is Less: Understanding Chain-of-Thought Length in LLMs [53.77747102201451]
Chain-of-thought (CoT) reasoning enhances the multi-step reasoning capabilities of large language models (LLMs)<n>However, for most models and tasks, does an increase in CoT length consistently lead to improved reasoning accuracy?<n>In this paper, we observe a nuanced relationship: as the number of reasoning steps increases, performance initially improves but eventually decreases.
arXiv Detail & Related papers (2025-02-11T05:28:59Z)
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion [20.293369733522983]
LongReason is a synthetic benchmark for evaluating the long-context reasoning capabilities of large language models. LongReason consists of 794 multiple-choice reasoning questions with diverse reasoning patterns across three task categories. We evaluate 21 LLMs on LongReason, revealing that most models experience significant performance drops as context length increases.
arXiv Detail & Related papers (2025-01-25T05:32:14Z)
NExtLong: Toward Effective Long-Context Training without Long Documents [28.002824369635768]
We propose NExtLong, a novel framework for long-context data through Negative document Extension.<n> NExtLong decomposes a document into multiple meta-chunks and extends the context by interleaving hard negative distractors retrieved from pretraining corpora.<n>Extensive experiments demonstrate that NExtLong achieves significant performance improvements compared to existing long-context synthesis approaches.
arXiv Detail & Related papers (2025-01-22T10:01:54Z)
Understanding Synthetic Context Extension via Retrieval Heads [51.8869530817334]
We investigate fine-tuning on synthetic data for three long-context tasks that require retrieval and reasoning.<n>We find that models trained on synthetic data fall short of the real data, but surprisingly, the mismatch can be interpreted.<n>Our results shed light on how to interpret synthetic data fine-tuning performance and how to approach creating better data for learning real-world capabilities over long contexts.
arXiv Detail & Related papers (2024-10-29T17:55:00Z)
LongReward: Improving Long-context Large Language Models with AI Feedback [54.3321542678909]
LongReward is a novel method that provides rewards for long-context model responses from four human-valued dimensions. Our experiments indicate that LongReward not only significantly improves models' long-context performance but also enhances their ability to follow short instructions.
arXiv Detail & Related papers (2024-10-28T17:50:42Z)
GATEAU: Selecting Influential Samples for Long Context Alignment [62.87020831987625]
GATEAU identifies influential samples enriched with long-range dependency relations.<n>Experiments indicate that GATEAU effectively identifies influential samples and the model trained on these selected samples exhibits better instruction-following and long-context understanding capabilities.
arXiv Detail & Related papers (2024-10-21T04:30:53Z)
ALR$^2$: A Retrieve-then-Reason Framework for Long-context Question Answering [42.146660039671076]
We develop a retrieve-then-reason framework for large language models (LLMs) We find that modern LLMs struggle to accurately retrieve relevant facts and instead, often hallucinate "retrieved facts" We introduce ALR$2$, a method that augments the long-context reasoning capability of LLMs via an explicit two-stage procedure.
arXiv Detail & Related papers (2024-10-04T08:29:12Z)
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA [71.04146366608904]
Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows. We propose a novel long-context benchmark, Loong, aligning with realistic scenarios through extended multi-document question answering (QA) Loong introduces four types of tasks with a range of context lengths: Spotlight Locating, Comparison, Clustering, and Chain of Reasoning.
arXiv Detail & Related papers (2024-06-25T09:42:56Z)
LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models [61.12177317970258]
LongSkywork is a long-context Large Language Model capable of processing up to 200,000 tokens. We develop two novel methods for creating synthetic data. LongSkywork achieves outstanding performance on a variety of long-context benchmarks.
arXiv Detail & Related papers (2024-06-02T03:34:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.