Related papers: The Cognitive Bandwidth Bottleneck: Shifting Long-Horizon Agent from Planning with Actions to Planning with Schemas

The Cognitive Bandwidth Bottleneck: Shifting Long-Horizon Agent from Planning with Actions to Planning with Schemas

URL: http://arxiv.org/abs/2510.07091v1
Date: Wed, 08 Oct 2025 14:47:40 GMT
Title: The Cognitive Bandwidth Bottleneck: Shifting Long-Horizon Agent from Planning with Actions to Planning with Schemas
Authors: Baixuan Xu, Tianshi Zheng, Zhaowei Wang, Hong Ting Tsang, Weiqi Wang, Tianqing Fang, Yangqiu Song,
Abstract summary: This paper systematically studies the effectiveness of two different action representations.<n>We propose cognitive bandwidth perspective as a conceptual framework to qualitatively understand the differences.<n>We provide an actionable guide for building more capable PwS agents for better scalable autonomy.
Score: 56.62286434195321
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Enabling LLMs to effectively operate long-horizon task which requires long-term planning and multiple interactions is essential for open-world autonomy. Conventional methods adopt planning with actions where a executable action list would be provided as reference. However, this action representation choice would be impractical when the environment action space is combinatorial exploded (e.g., open-ended real world). This naturally leads to a question: As environmental action space scales, what is the optimal action representation for long-horizon agents? In this paper, we systematically study the effectiveness of two different action representations. The first one is conventional planning with actions (PwA) which is predominantly adopted for its effectiveness on existing benchmarks. The other one is planning with schemas (PwS) which instantiate an action schema into action lists (e.g., "move [OBJ] to [OBJ]" -> "move apple to desk") to ensure concise action space and reliable scalability. This alternative is motivated by its alignment with human cognition and its compliance with environment-imposed action format restriction. We propose cognitive bandwidth perspective as a conceptual framework to qualitatively understand the differences between these two action representations and empirically observe a representation-choice inflection point between ALFWorld (~35 actions) and SciWorld (~500 actions), which serve as evidence of the need for scalable representations. We further conduct controlled experiments to study how the location of this inflection point interacts with different model capacities: stronger planning proficiency shifts the inflection rightward, whereas better schema instantiation shifts it leftward. Finally, noting the suboptimal performance of PwS agents, we provide an actionable guide for building more capable PwS agents for better scalable autonomy.

Related papers

Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert [60.88976842557026]
Vision-Language Models (VLM) have demonstrated impressive planning and reasoning capabilities.<n>Recent dual-system approaches attempt to decouple "thinking" from "acting"<n>We introduce a framework centered around a generalizable action expert.
arXiv Detail & Related papers (2025-10-04T18:33:27Z)
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning [47.27336786187929]
Vision-language-action (VLA) reasoning tasks require agents to interpret multimodal instructions, perform long-horizon planning, and act adaptively in dynamic environments.<n>Existing approaches typically train VLA models in an end-to-end fashion, directly mapping inputs to actions without explicit reasoning.<n>We propose ThinkAct, a dual-system framework that bridges high-level reasoning with low-level action execution via reinforced visual latent planning.
arXiv Detail & Related papers (2025-07-22T17:59:46Z)
GTA1: GUI Test-time Scaling Agent [97.58177633084915]
Graphical user interface (GUI) agents autonomously complete tasks across platforms (eg, Linux) by sequentially decomposing user instructions into action proposals.<n>This paper investigates the aforementioned challenges with our textbfGUI textbfTest-time Scaling textbfAgent, namely GTA1.
arXiv Detail & Related papers (2025-07-08T08:52:18Z)
Adaptive Interactive Navigation of Quadruped Robots using Large Language Models [14.14967096139099]
We present a primitive tree for task planning with large language models (LLMs)<n>We adopt reinforcement learning to pre-train a comprehensive skill library containing versatile locomotion and interaction behaviors for motion planning.<n> integrated with the tree structure, the replanning mechanism allows for convenient node addition and pruning.
arXiv Detail & Related papers (2025-03-29T02:17:52Z)
DynaSaur: Large Language Agents Beyond Predefined Actions [126.98162266986554]
Existing LLM agent systems typically select actions from a fixed and predefined set at every step.<n>We propose an LLM agent framework that can dynamically create and compose actions as needed.<n>In this framework, the agent interacts with its environment by generating and executing programs written in a general-purpose programming language.
arXiv Detail & Related papers (2024-11-04T02:08:59Z)
GLANCE: Global Actions in a Nutshell for Counterfactual Explainability [10.25011737760687]
We introduce GLANCE, a versatile and adaptive framework, comprising two algorithms. C-GLANCE employs a clustering approach that considers both the feature space and the space of counterfactual actions. T-GLANCE provides additional features to enhance flexibility.
arXiv Detail & Related papers (2024-05-29T09:24:25Z)
Deep hybrid models: infer and plan in a dynamic world [0.0]
We present an active inference approach that exploits discrete and continuous processing, based on three features.<n>We show that the model can tackle the presented task under different conditions.
arXiv Detail & Related papers (2024-02-01T15:15:25Z)
AI planning in the imagination: High-level planning on learned abstract search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training. We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z)
Deliberative Acting, Online Planning and Learning with Hierarchical Operational Models [5.597986898418404]
In AI research, a plan of action has typically used descriptive models of the actions that abstractly specify what might happen as a result of an action. executing the planned actions has needed operational models, in which rich computational control structures and closed-loop online decision-making are used. We implement an integrated acting and planning system in which both planning and acting use the same operational models.
arXiv Detail & Related papers (2020-10-02T14:50:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.