Related papers: ToM: Leveraging Tree-oriented MapReduce for Long-Context Reasoning in Large Language Models

ToM: Leveraging Tree-oriented MapReduce for Long-Context Reasoning in Large Language Models

URL: http://arxiv.org/abs/2511.00489v1
Date: Sat, 01 Nov 2025 10:43:58 GMT
Title: ToM: Leveraging Tree-oriented MapReduce for Long-Context Reasoning in Large Language Models
Authors: Jiani Guo, Zuchao Li, Jie Wu, Qianren Wang, Yun Li, Lefei Zhang, Hai Zhao, Yujiu Yang,
Abstract summary: ToM is a novel Tree-oriented MapReduce framework for long-context reasoning.<n>We show that ToM significantly outperforms existing divide-and-conquer frameworks and retrieval-augmented generation methods.
Score: 107.86069298500855
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs), constrained by limited context windows, often face significant performance degradation when reasoning over long contexts. To address this, Retrieval-Augmented Generation (RAG) retrieves and reasons over chunks but frequently sacrifices logical coherence due to its reliance on similarity-based rankings. Similarly, divide-and-conquer frameworks (DCF) split documents into small chunks for independent reasoning and aggregation. While effective for local reasoning, DCF struggles to capture long-range dependencies and risks inducing conflicts by processing chunks in isolation. To overcome these limitations, we propose ToM, a novel Tree-oriented MapReduce framework for long-context reasoning. ToM leverages the inherent hierarchical structure of long documents (e.g., main headings and subheadings) by constructing a DocTree through hierarchical semantic parsing and performing bottom-up aggregation. Using a Tree MapReduce approach, ToM enables recursive reasoning: in the Map step, rationales are generated at child nodes; in the Reduce step, these rationales are aggregated across sibling nodes to resolve conflicts or reach consensus at parent nodes. Experimental results on 70B+ LLMs show that ToM significantly outperforms existing divide-and-conquer frameworks and retrieval-augmented generation methods, achieving better logical coherence and long-context reasoning. Our code is available at https://github.com/gjn12-31/ToM .

Related papers

Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation [22.803751188961865]
We argue retrieval should move beyond similarity matching and instead operate over latent components.<n>We propose xMemory, which builds a hierarchy of intact units and maintains a searchable high-level node organisation.
arXiv Detail & Related papers (2026-02-02T12:04:58Z)
DMAP: Human-Aligned Structural Document Map for Multimodal Document Understanding [30.54420648726099]
Document-level structural Document MAP encodes both hierarchical organization and inter-element relationships within multimodal documents.<n>Building upon this representation, a Reflective Reasoning Agent performs structure-aware and evidence-driven reasoning.<n>Experiments on MMDocQA benchmarks demonstrate that DMAP yields document-specific structural representations aligned with human interpretive patterns.
arXiv Detail & Related papers (2026-01-26T06:38:25Z)
TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG [71.06073770344732]
Agentic retrieval-augmented generation (RAG) formulates question answering as a multi-step interaction between reasoning and information retrieval.<n>We present TreePS-RAG, an online, tree-based RL framework for agentic RAG that enables step-wise credit assignment while retaining outcome-only rewards.
arXiv Detail & Related papers (2026-01-11T14:07:30Z)
AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees [66.39371821756649]
We propose AdmTree, a novel framework for adaptive, hierarchical context compression.<n>AdmTree segments input based on information density, utilizing gist tokens to summarize variable-length segments as the leaves of a semantic binary tree.<n>By preserving fine-grained details alongside global semantic coherence, mitigating positional bias, and dynamically adapting to content, AdmTree robustly retains the semantic information of long contexts.
arXiv Detail & Related papers (2025-12-04T08:04:19Z)
Resolving Evidence Sparsity: Agentic Context Engineering for Long-Document Understanding [49.26132236798123]
Vision Language Models (VLMs) have gradually become a primary approach in document understanding.<n>We propose SLEUTH, a multi agent framework that orchestrates a retriever and four collaborative agents in a coarse to fine process.<n>The framework identifies key textual and visual clues within the retrieved pages, filters for salient visual evidence such as tables and charts, and analyzes the query to devise a reasoning strategy.
arXiv Detail & Related papers (2025-11-28T03:09:40Z)
LLM-guided Hierarchical Retrieval [54.73080745446999]
LATTICE is a hierarchical retrieval framework that enables an LLM to reason over and navigate large corpora with logarithmic search complexity.<n>A central challenge in such LLM-guided search is that the model's relevance judgments are noisy, context-dependent, and unaware of the hierarchy.<n>Our framework achieves state-of-the-art zero-shot performance on the reasoning-intensive BRIGHT benchmark.
arXiv Detail & Related papers (2025-10-15T07:05:17Z)
Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning [11.045096250408067]
Tree of Agents (TOA) is a multi-agent reasoning framework that segments the input into chunks processed by independent agents.<n>TOA enables agents to probe different reasoning orders for multi-perspective understanding.<n>To improve processing efficiency, we incorporate prefix-hash caching and adaptive pruning strategies.
arXiv Detail & Related papers (2025-09-08T08:34:02Z)
When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework [39.66331560468973]
We investigate the challenge of applying Large Language Models (LLMs) to long texts.<n>We propose a theoretical framework that distinguishes the failure modes of long context tasks into three categories: cross-chunk dependence (task noise), confusion that grows with context size (model noise), and the imperfect integration of partial results (aggregator noise)
arXiv Detail & Related papers (2025-06-19T15:49:34Z)
Toward Multi-Session Personalized Conversation: A Large-Scale Dataset and Hierarchical Tree Framework for Implicit Reasoning [30.54506564763053]
We introduce ImplexConv, a large-scale long-term dataset with 2,500 examples, each containing approximately 100 conversation sessions.<n>We also propose TaciTree, a novel hierarchical tree framework that structures conversation history into multiple levels of summarization.
arXiv Detail & Related papers (2025-03-10T07:59:41Z)
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls [83.89771461061903]
Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs)<n>Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs)<n>We identify two key challenges contributing to this inefficiency: $textitover-exploration$ due to redundant states with semantically equivalent content, and $textitunder-exploration$ caused by high variance in verifier scoring.<n>We propose FETCH, a flexible, plug-and-play system compatible with various tree search algorithms.
arXiv Detail & Related papers (2025-02-16T16:12:01Z)
ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval [64.44265315244579]
We propose a tree-based method for organizing and representing reference documents at various granular levels.<n>Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches.<n>Our evaluations show that ReTreever generally preserves full representation accuracy.
arXiv Detail & Related papers (2025-02-11T21:35:13Z)
LLM$\times$MapReduce: Simplified Long-Sequence Processing using Large Language Models [73.13933847198395]
We propose a training-free framework for processing long texts, utilizing a divide-and-conquer strategy to achieve comprehensive document understanding. The proposed LLM$times$MapReduce framework splits the entire document into several chunks for LLMs to read and then aggregates the intermediate answers to produce the final output.
arXiv Detail & Related papers (2024-10-12T03:13:44Z)
Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation [1.4665304971699265]
HAT encapsulates information from children nodes, enabling broad coverage with depth control. experiments show HAT improves dialog coherence and summary quality over baseline contexts.
arXiv Detail & Related papers (2024-06-10T09:29:08Z)
Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation [75.93960390191262]
We exploit prior knowledge of the relations among object categories to cluster fine-grained classes into coarser parent classes. We propose a simple yet effective resampling method, NMS Resampling, to re-balance the data distribution. Our method, termed as Forest R-CNN, can serve as a plug-and-play module being applied to most object recognition models.
arXiv Detail & Related papers (2020-08-13T03:52:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.