Related papers: InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents

InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents

URL: http://arxiv.org/abs/2601.03204v1
Date: Tue, 06 Jan 2026 17:35:57 GMT
Title: InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents
Authors: Chenglin Yu, Yuchen Wang, Songmiao Wang, Hongxia Yang, Ming Li,
Abstract summary: InfiAgent keeps the agent's reasoning context strictly bounded regardless of task duration.<n>InfiAgent with a 20B open-source model is competitive with larger proprietary systems.
Score: 36.740230738304525
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLM agents can reason and use tools, but they often break down on long-horizon tasks due to unbounded context growth and accumulated errors. Common remedies such as context compression or retrieval-augmented prompting introduce trade-offs between information fidelity and reasoning stability. We present InfiAgent, a general-purpose framework that keeps the agent's reasoning context strictly bounded regardless of task duration by externalizing persistent state into a file-centric state abstraction. At each step, the agent reconstructs context from a workspace state snapshot plus a fixed window of recent actions. Experiments on DeepResearch and an 80-paper literature review task show that, without task-specific fine-tuning, InfiAgent with a 20B open-source model is competitive with larger proprietary systems and maintains substantially higher long-horizon coverage than context-centric baselines. These results support explicit state externalization as a practical foundation for stable long-horizon agents. Github Repo:https://github.com/ChenglinPoly/infiAgent

Related papers

LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth [32.1520194112537]
Large language models (LLMs) are increasingly capable of carrying out long-running, real-world tasks.<n>As the amount of context grows, their reliability often deteriorates, a phenomenon known as "context rot"
arXiv Detail & Related papers (2026-02-08T13:20:39Z)
FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents [53.03492387564392]
We introduce FS-Researcher, a file-system-based framework that scales deep research beyond the context window via a persistent workspace.<n>A Context Builder agent browses the internet, writes structured notes, and archives raw sources into a hierarchical knowledge base that can grow far beyond context length.<n>A Report Writer agent then composes the final report section by section, treating the knowledge base as the source of facts.
arXiv Detail & Related papers (2026-02-02T03:00:19Z)
CaveAgent: Transforming LLMs into Stateful Runtime Operators [31.548422546991915]
We present CaveAgent, a framework that transforms the paradigm from "LLM-as-Text-Generator" to "LLM-as-As-Runtime-Runtime"<n>CaveAgent achieves a 10.5% success rate improvement on retail tasks and reduces total token consumption by 28.4% in multi-turn scenarios.
arXiv Detail & Related papers (2026-01-04T15:32:47Z)
SCOPE: Prompt Evolution for Enhancing Agent Effectiveness [53.75986399936395]
Large Language Model (LLM) agents are increasingly deployed in environments that generate massive, dynamic contexts.<n>While agents have access to this context, their static prompts lack the mechanisms to manage it effectively.<n>We introduce textbfSCOPE (Self-evolving Context Optimization via Prompt Evolution)<n>We propose a Dual-Stream mechanism that balances tactical specificity (resolving immediate errors) with strategic generality (evolving long-term principles)
arXiv Detail & Related papers (2025-12-17T12:25:05Z)
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents [79.29376673236142]
Existing benchmarks fail to rigorously evaluate the long-horizon capabilities required to build complete software systems.<n>We present NL2Repo Bench, a benchmark explicitly designed to evaluate the long-horizon repository generation ability of coding agents.
arXiv Detail & Related papers (2025-12-14T15:12:13Z)
AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management [24.465443389008055]
AgentProg is a program-guided approach for agent context management.<n>It reframes the interaction history as a program with variables and control flow.<n> Experiments on AndroidWorld and our extended long-horizon task suite demonstrate that AgentProg has achieved the state-of-the-art success rates.
arXiv Detail & Related papers (2025-12-11T07:37:38Z)
AgentFold: Long-Horizon Web Agents with Proactive Context Management [98.54523771369018]
LLM-based web agents show immense promise for information seeking, yet their effectiveness is hindered by a fundamental trade-off in context management.<n>We introduce AgentFold, a novel agent paradigm centered on proactive context management.<n>With simple supervised fine-tuning, our AgentFold-30B-A3B agent achieves 36.2% on BrowseComp and 47.3% on BrowseComp-ZH.
arXiv Detail & Related papers (2025-10-28T17:51:50Z)
DeepAgent: A General Reasoning Agent with Scalable Toolsets [111.6384541877723]
DeepAgent is an end-to-end deep reasoning agent that performs autonomous thinking, tool discovery, and action execution.<n>To address the challenges of long-horizon interactions, we introduce an autonomous memory folding mechanism that compresses past interactions into structured episodic, working, and tool memories.<n>We develop an end-to-end reinforcement learning strategy, namely ToolPO, that leverages LLM-simulated APIs and applies tool-call advantage attribution to assign fine-grained credit to the tool invocation tokens.
arXiv Detail & Related papers (2025-10-24T16:24:01Z)
UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios [63.67884284105684]
We introduce textbfUltraHorizon, a novel benchmark that measures the foundational capabilities essential for complex real-world challenges.<n>Agents are designed in long-horizon discovery tasks where they must iteratively uncover hidden rules.<n>Our experiments reveal that LLM-agents consistently underperform in these settings, whereas human participants achieve higher scores.
arXiv Detail & Related papers (2025-09-26T02:04:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.