Related papers: SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning

SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning

URL: http://arxiv.org/abs/2503.11951v2
Date: Tue, 18 Mar 2025 05:00:47 GMT
Title: SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning
Authors: Edward Y. Chang, Longling Geng,
Abstract summary: SagaLLM is a structured multi-agent framework that addresses four fundamental limitations in current LLM approaches.<n>By implementing specialized context management agents and validation protocols, SagaLLM preserves critical constraints and state information throughout complex planning processes.
Score: 2.1331883629523634
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent LLM-based agent frameworks have demonstrated impressive capabilities in task delegation and workflow orchestration, but face significant challenges in maintaining context awareness and ensuring planning consistency. This paper presents SagaLLM, a structured multi-agent framework that addresses four fundamental limitations in current LLM approaches: inadequate self-validation, context narrowing, lacking transaction properties, and insufficient inter-agent coordination. By implementing specialized context management agents and validation protocols, SagaLLM preserves critical constraints and state information throughout complex planning processes, enabling robust and consistent decision-making even during disruptions. We evaluate our approach using selected problems from the REALM benchmark, focusing on sequential and reactive planning scenarios that challenge both context retention and adaptive reasoning. Our experiments with state-of-the-art LLMs, Claude 3.7, DeepSeek R1, GPT-4o, and GPT-o1, demonstrate that while these models exhibit impressive reasoning capabilities, they struggle with maintaining global constraint awareness during complex planning tasks, particularly when adapting to unexpected changes. In contrast, the distributed cognitive architecture of SagaLLM shows significant improvements in planning consistency, constraint enforcement, and adaptation to disruptions in various scenarios.

Related papers

Hierarchical Planning for Complex Tasks with Knowledge Graph-RAG and Symbolic Verification [5.727096041675994]
Large Language Models (LLMs) have shown promise as robotic planners but often struggle with long-horizon and complex tasks. We propose a neuro-symbolic approach that enhances LLMs-based planners with Knowledge Graph-based RAG for hierarchical plan generation.
arXiv Detail & Related papers (2025-04-06T18:36:30Z)
Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems [31.894636711684523]
We propose a novel parallelized planning-acting framework for Multi-Agent Systems. The proposed framework features a dual-thread architecture with interruptible execution to enable concurrent planning and acting.
arXiv Detail & Related papers (2025-03-05T13:53:10Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
MACI: Multi-Agent Collaborative Intelligence for Adaptive Reasoning and Temporal Planning [2.5200794639628032]
Multi-Agent Collaborative Intelligence (MACI)<n>A framework comprising three key components: 1) a meta-planner (MP) that identifies, formulates, and refines all roles and constraints of a task while generating a dependency graph, with common-sense augmentation to ensure realistic and practical constraints; 2) a collection of agents to facilitate planning and address task-specific requirements; and 3) a run-time monitor that manages plan adjustments as needed.
arXiv Detail & Related papers (2025-01-28T03:57:22Z)
PoAct: Policy and Action Dual-Control Agent for Generalized Applications [18.342339678035685]
This paper proposes Policy and Action Dual-Control Agent (PoAct) for generalized applications.<n>PoAct aims to achieve higher-quality code actions and more accurate reasoning paths by dynamically switching reasoning policies and modifying the action space.
arXiv Detail & Related papers (2025-01-13T04:28:40Z)
Ontology-driven Prompt Tuning for LLM-based Task and Motion Planning [0.20940572815908076]
Task and Motion Planning (TAMP) approaches combine high-level symbolic plan with low-level motion planning. LLMs are transforming task planning by offering natural language as an intuitive and flexible way to describe tasks. This work proposes a novel prompt-tuning framework that employs knowledge-based reasoning to refine and expand user prompts.
arXiv Detail & Related papers (2024-12-10T13:18:45Z)
Interactive and Expressive Code-Augmented Planning with Large Language Models [62.799579304821826]
Large Language Models (LLMs) demonstrate strong abilities in common-sense reasoning and interactive decision-making. Recent techniques have sought to structure LLM outputs using control flow and other code-adjacent techniques to improve planning performance. We propose REPL-Plan, an LLM planning approach that is fully code-expressive and dynamic.
arXiv Detail & Related papers (2024-11-21T04:23:17Z)
Deliberate Reasoning in Language Models as Structure-Aware Planning with an Accurate World Model [14.480267340831542]
Structure-aware Planning with an Accurate World Model (SWAP)<n>SWAP integrates structured knowledge representation with learned planning.<n>We evaluate SWAP across diverse reasoning-intensive benchmarks including math reasoning, logical reasoning, and coding tasks.
arXiv Detail & Related papers (2024-10-04T04:23:36Z)
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability [59.72892401927283]
We evaluate the planning capabilities of OpenAI's o1 models across a variety of benchmark tasks. Our results reveal that o1-preview outperforms GPT-4 in adhering to task constraints.
arXiv Detail & Related papers (2024-09-30T03:58:43Z)
Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes. CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks. It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z)
Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation [49.27250832754313]
We present AgentCOT, a llm-based autonomous agent framework. At each step, AgentCOT selects an action and executes it to yield an intermediate result with supporting evidence. We introduce two new strategies to enhance the performance of AgentCOT.
arXiv Detail & Related papers (2024-09-19T02:20:06Z)
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning [79.38140606606126]
We propose an algorithmic framework that fine-tunes vision-language models (VLMs) with reinforcement learning (RL) Our framework provides a task description and then prompts the VLM to generate chain-of-thought (CoT) reasoning. We demonstrate that our proposed framework enhances the decision-making capabilities of VLM agents across various tasks.
arXiv Detail & Related papers (2024-05-16T17:50:19Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.