Related papers: Long Context Scaling: Divide and Conquer via Multi-Agent Question-driven Collaboration

Long Context Scaling: Divide and Conquer via Multi-Agent Question-driven Collaboration

URL: http://arxiv.org/abs/2505.20625v2
Date: Thu, 10 Jul 2025 17:12:27 GMT
Title: Long Context Scaling: Divide and Conquer via Multi-Agent Question-driven Collaboration
Authors: Sibo Xiao, Zixin Lin, Wenyang Gao, Hui Chen, Yue Zhang,
Abstract summary: We propose a novel multi-agent framework for processing long contexts.<n>XpandA (Expand-Agent) is coupled with question-driven workflow and dynamic partitioning.<n>XpandA achieves 20% improvements and 1.5x inference speedup over baselines of full-context, RAG and previous agent-based methods.
Score: 11.477571238310276
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Processing long contexts has become a critical capability for modern large language models (LLMs). Existing works leverage agent-based divide-and-conquer methods for processing long contexts. But these methods face crucial limitations, including prohibitive accumulated latency and amplified information loss from excessive agent invocations, and the disruption of inherent textual dependencies by immoderate partitioning. In this paper, we propose a novel multi-agent framework XpandA (Expand-Agent) coupled with question-driven workflow and dynamic partitioning for robust long-context processing. XpandA overcomes these limitations through: 1) dynamic partitioning of long texts, which adaptively modulates the filling rate of context windows for input sequences of vastly varying lengths; 2) question-guided protocol to update flat information ensembles within centralized shared memory, constructing consistent inter-agent knowledge across partitions; and 3) selectively replaying specific partitions based on the state-tracking of question-information couples to promote the resolution of inverted-order structures across partitions (e.g., flashbacks). We perform a comprehensive evaluation of XpandA on multiple long-context benchmarks with length varying from 1k to 1M, demonstrating XpandA's feasibility for processing ultra-long sequences and its significant effectiveness in enhancing the long-context capabilities of various LLMs by achieving 20\% improvements and 1.5x inference speedup over baselines of full-context, RAG and previous agent-based methods.

Related papers

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent [53.82053723030023]
We introduce a novel agent workflow, MemAgent, which reads text in segments and updates the memory using an overwrite strategy.<n>MemAgent has demonstrated superb long-context capabilities, being able to extrapolate from an 8K context trained on 32K text to a 3.5M QA task with performance loss 5% and achieves 95%+ in 512K RULER test.
arXiv Detail & Related papers (2025-07-03T03:11:50Z)
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [84.62985963113245]
We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant memory across long multi-turn tasks.<n>At each turn, MEM1 updates a compact shared internal state that jointly supports memory consolidation and reasoning.<n>We show that MEM1-7B improves performance by 3.5x while reducing memory usage by 3.7x compared to Qwen2.5-14B-Instruct on a 16-objective multi-hop QA task.
arXiv Detail & Related papers (2025-06-18T19:44:46Z)
Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers [58.98923344096319]
REFORM is a novel inference framework that efficiently handles long contexts through a two-phase approach.<n>It achieves over 50% and 27% performance gains on RULER and BABILong respectively at 1M context length.<n>It also outperforms baselines on Infinite-Bench and MM-NIAH, demonstrating flexibility across diverse tasks and domains.
arXiv Detail & Related papers (2025-06-01T23:49:14Z)
Towards Multi-Granularity Memory Association and Selection for Long-Term Conversational Agents [73.77930932005354]
We propose MemGAS, a framework that enhances memory consolidation by constructing multi-granularity association, adaptive selection, and retrieval.<n>MemGAS is based on multi-granularity memory units and employs Gaussian Mixture Models to cluster and associate new memories with historical ones.<n>Experiments on four long-term memory benchmarks demonstrate that MemGAS outperforms state-of-the-art methods on both question answer and retrieval tasks.
arXiv Detail & Related papers (2025-05-26T06:13:07Z)
InfiniteICL: Breaking the Limit of Context Window Size via Long Short-term Memory Transformation [57.310236384112834]
In-context learning (ICL) is critical for large language models (LLMs) but its effectiveness is constrained by finite context windows.<n>We introduce InfiniteICL, a framework that parallels context and parameters in LLMs with short- and long-term memory.<n>We demonstrate that our method reduces context length by 90% while achieving 103% average performance of full-context prompting.
arXiv Detail & Related papers (2025-04-02T13:15:44Z)
Emulating Retrieval Augmented Generation via Prompt Engineering for Enhanced Long Context Comprehension in LLMs [23.960451986662996]
This paper proposes a method that emulates Retrieval Augmented Generation (RAG) through specialized prompt engineering and chain-of-thought reasoning.<n>We evaluate our approach on selected tasks from BABILong, which interleaves standard bAbI QA problems with large amounts of distractor text.
arXiv Detail & Related papers (2025-02-18T02:49:40Z)
Does RAG Really Perform Bad For Long-Context Processing? [15.889864680212147]
RetroLM is a novel framework for long-context processing.<n>Unlike traditional methods, RetroLM employs KV-level retrieval augmentation.<n>Building on this framework, we develop a specialized retriever for precise retrieval of critical pages.
arXiv Detail & Related papers (2025-02-17T05:02:25Z)
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs [10.84210988032097]
We introduce Long-form Context Injection with Recurrent Compression (LCIRC), a method that enables the efficient processing long-form sequences beyond the model's length limit.<n>We also introduce query dependent context modeling, which selectively compresses query-relevant information, ensuring that the model retains the most pertinent content.
arXiv Detail & Related papers (2025-02-10T04:02:18Z)
Multi-granularity Contrastive Cross-modal Collaborative Generation for End-to-End Long-term Video Question Answering [53.39158264785098]
Long-term Video Question Answering (VideoQA) is a challenging vision-and-language bridging task. We present an entirely end-to-end solution for VideoQA: Multi-granularity Contrastive cross-modal collaborative Generation model.
arXiv Detail & Related papers (2024-10-12T06:21:58Z)
LLM$\times$MapReduce: Simplified Long-Sequence Processing using Large Language Models [73.13933847198395]
We propose a training-free framework for processing long texts, utilizing a divide-and-conquer strategy to achieve comprehensive document understanding. The proposed LLM$times$MapReduce framework splits the entire document into several chunks for LLMs to read and then aggregates the intermediate answers to produce the final output.
arXiv Detail & Related papers (2024-10-12T03:13:44Z)
Chain of Agents: Large Language Models Collaborating on Long-Context Tasks [39.27648679819897]
Chain-of-Agents (CoA) is a novel framework that harnesses multi-agent collaboration through natural language to enable information aggregation and context reasoning. CoA processes the entire input by interleaving reading and reasoning, and it mitigates long context focus issues by assigning each agent a short context.
arXiv Detail & Related papers (2024-06-04T23:36:08Z)
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading [63.93888816206071]
We introduce MemWalker, a method that processes the long context into a tree of summary nodes. Upon receiving a query, the model navigates this tree in search of relevant information, and responds once it gathers sufficient information. We show that, beyond effective reading, MemWalker enhances explainability by highlighting the reasoning steps as it interactively reads the text; pinpointing the relevant text segments related to the query.
arXiv Detail & Related papers (2023-10-08T06:18:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.