Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2510.26389v1
- Date: Thu, 30 Oct 2025 11:32:45 GMT
- Title: Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning
- Authors: Wenchang Duan, Yaoliang Yu, Jiwan He, Yi Shi,
- Abstract summary: deep multi-agent reinforcement learning (MARL) has demonstrated promising performance for solving challenging tasks.<n>We propose a novel MARL framework to obtain adaptive and effective contextual information.<n>We show that the proposed method achieves state-of-the-art (SOTA) performance on long-term dependency tasks.
- Score: 22.147022270074373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, deep multi-agent reinforcement learning (MARL) has demonstrated promising performance for solving challenging tasks, such as long-term dependencies and non-Markovian environments. Its success is partly attributed to conditioning policies on large fixed context length. However, such large fixed context lengths may lead to limited exploration efficiency and redundant information. In this paper, we propose a novel MARL framework to obtain adaptive and effective contextual information. Specifically, we design a central agent that dynamically optimizes context length via temporal gradient analysis, enhancing exploration to facilitate convergence to global optima in MARL. Furthermore, to enhance the adaptive optimization capability of the context length, we present an efficient input representation for the central agent, which effectively filters redundant information. By leveraging a Fourier-based low-frequency truncation method, we extract global temporal trends across decentralized agents, providing an effective and efficient representation of the MARL environment. Extensive experiments demonstrate that the proposed method achieves state-of-the-art (SOTA) performance on long-term dependency tasks, including PettingZoo, MiniGrid, Google Research Football (GRF), and StarCraft Multi-Agent Challenge v2 (SMACv2).
Related papers
- URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding [55.45331924836242]
We present URaG, a framework that Unifies Retrieval and Generation within a single MLLM.<n>We show that URaG achieves state-of-the-art performance while reducing computational overhead by 44-56%.
arXiv Detail & Related papers (2025-11-13T17:54:09Z) - Scaling LLM Multi-turn RL with End-to-end Summarization-based Context Management [19.980762483472354]
We introduce summarization-based context management to training.<n>We instantiate this framework with underlineSUmmarization augmented underlinePolicy underlineOptimization (textttSUPO)<n>Our results establish summarization-based context management as a principled and scalable approach for training RL agents beyond a fixed context length limit.
arXiv Detail & Related papers (2025-10-08T07:29:22Z) - Beyond RAG vs. Long-Context: Learning Distraction-Aware Retrieval for Efficient Knowledge Grounding [5.353135097018941]
Retrieval-Augmented Generation (RAG) is a framework for grounding Large Language Models (LLMs) in external, up-to-date information.<n>We propose LDAR (Learning Distraction-Aware Retrieval), an adaptive retriever that learns to retrieve contexts in a way that mitigates interference from distracting passages.
arXiv Detail & Related papers (2025-09-26T04:40:42Z) - REFRAG: Rethinking RAG based Decoding [67.4862300145604]
REFRAG is an efficient decoding framework that compresses, senses, and expands to improve latency in RAG applications.<n>We provide rigorous validation of REFRAG across diverse long-context tasks, including RAG, multi-turn conversations, and long document summarization.
arXiv Detail & Related papers (2025-09-01T03:31:44Z) - Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization [56.97588709890706]
LongMab-PO is a novel framework that generates high-quality and diverse responses for long-context modeling tasks.<n> Experimental results show that LongMab-PO significantly improves the diversity and quality of preference data pairs.
arXiv Detail & Related papers (2025-08-19T16:33:55Z) - Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z) - PowerAttention: Exponentially Scaling of Receptive Fields for Effective Sparse Attention [73.26995918610669]
Large Language Models (LLMs) face efficiency bottlenecks due to the quadratic complexity of the attention mechanism when processing long contexts.<n>We introduce PowerAttention, a novel sparse attention design that facilitates effective and complete context extension.<n>Experiments demonstrate that PowerAttention outperforms existing static sparse attention methods by $5sim 40%$.
arXiv Detail & Related papers (2025-03-05T15:24:11Z) - Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation [42.590255022001145]
Matryoshka Representation Learning (MRL) recently emerged as a solution for adaptive embedding lengths.<n>We show that sparse coding offers a compelling alternative for achieving adaptive representation with minimal overhead and higher fidelity.
arXiv Detail & Related papers (2025-03-03T17:59:48Z) - In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z) - Bidirectional Long-Range Parser for Sequential Data Understanding [3.76054468268713]
We introduce BLRP (Bidirectional Long-Range), a novel and versatile attention mechanism designed to increase performance and efficiency on long-sequence tasks.
We show the benefits and versatility of our approach on vision and language domains by demonstrating competitive results against state-of-the-art methods.
arXiv Detail & Related papers (2024-04-08T05:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.