Related papers: ESAA: Event Sourcing for Autonomous Agents in LLM-Based Software Engineering

ESAA: Event Sourcing for Autonomous Agents in LLM-Based Software Engineering

URL: http://arxiv.org/abs/2602.23193v1
Date: Thu, 26 Feb 2026 16:45:59 GMT
Title: ESAA: Event Sourcing for Autonomous Agents in LLM-Based Software Engineering
Authors: Elzo Brito dos Santos Filho,
Abstract summary: This paper presents the ESAA (Event Sourcing for Autonomous Agents) architecture.<n>The architecture separates the cognitive intention from the project's state mutation, inspired by the Event Sourcing pattern.<n>Two case studies validate the architecture, providing empirical evidence of the architecture's scalability beyond single-agent scenarios.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autonomous agents based on Large Language Models (LLMs) have evolved from reactive assistants to systems capable of planning, executing actions via tools, and iterating over environment observations. However, they remain vulnerable to structural limitations: lack of native state, context degradation over long horizons, and the gap between probabilistic generation and deterministic execution requirements. This paper presents the ESAA (Event Sourcing for Autonomous Agents) architecture, which separates the agent's cognitive intention from the project's state mutation, inspired by the Event Sourcing pattern. In ESAA, agents emit only structured intentions in validated JSON (agent.result or issue.report); a deterministic orchestrator validates, persists events in an append-only log (activity.jsonl), applies file-writing effects, and projects a verifiable materialized view (roadmap.json). The proposal incorporates boundary contracts (AGENT_CONTRACT.yaml), metaprompting profiles (PARCER), and replay verification with hashing (esaa verify), ensuring the immutability of completed tasks and forensic traceability. Two case studies validate the architecture: (i) a landing page project (9 tasks, 49 events, single-agent composition) and (ii) a clinical dashboard system (50 tasks, 86 events, 4 concurrent agents across 8 phases), both concluding with run.status=success and verify_status=ok. The multi-agent case study demonstrates real concurrent orchestration with heterogeneous LLMs (Claude Sonnet 4.6, Codex GPT-5, Antigravity/Gemini 3 Pro, and Claude Opus 4.6), providing empirical evidence of the architecture's scalability beyond single-agent scenarios.

Related papers

Agentic Peer-to-Peer Networks: From Content Distribution to Capability and Action Sharing [10.47562113256175]
This article outlines the networking foundations needed to make such collaboration practical.<n>We propose a plane-based reference architecture that decouples connectivity/identity, semantic discovery, and execution.<n>We also present a textittiered verification spectrum: Tier1 relies on reputation signals, Tier2 applies lightweight canary challenge-response with fallback selection, and Tier3 requires evidence packages such as signed tool receipts/traces.
arXiv Detail & Related papers (2026-03-04T05:58:44Z)
ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems [25.131570054560353]
Current agentic frameworks underperform on long-horizon tasks.<n>We introduce ROMA, a domain-agnostic framework that addresses these limitations.<n>We show that ROMA, combined with GEPA+, delivers leading system-level performance on reasoning and long-form generation benchmarks.
arXiv Detail & Related papers (2026-02-02T09:20:59Z)
TriCEGAR: A Trace-Driven Abstraction Mechanism for Agentic AI [5.1181001367075]
TriCEGAR is a trace-driven abstraction mechanism that automates state construction from execution logs.<n>We describe a framework-native implementation that captures typed agent lifecycle events and builds abstractions from traces.<n>We also show how run likelihoods enable anomaly detection as a guardrailing signal.
arXiv Detail & Related papers (2026-01-30T14:01:47Z)
Veri-Sure: A Contract-Aware Multi-Agent Framework with Temporal Tracing and Formal Verification for Correct RTL Code Generation [4.723302382132762]
silicon-grade correctness remains bottlenecked by: (i) limited test coverage and reliability of simulation-centric evaluation, (ii) regressions and repair hallucinations, and (iii) semantic drift as intent is reinterpreted across agent handoffs.<n>We propose Veri-Sure, a multi-agent framework that establishes a design contract to align agents' intent and uses a patching mechanism guided by static dependency slicing to perform precise, localized repairs.
arXiv Detail & Related papers (2026-01-27T16:10:23Z)
Sponge Tool Attack: Stealthy Denial-of-Efficiency against Tool-Augmented Agentic Reasoning [58.432996881401415]
Recent work augments large language models (LLMs) with external tools to enable agentic reasoning.<n>We propose Sponge Tool Attack (STA), which disrupts agentic reasoning solely by rewriting the input prompt.<n>STA generates benign-looking prompt rewrites from the original one with high semantic fidelity.
arXiv Detail & Related papers (2026-01-24T19:36:51Z)
A Lightweight Modular Framework for Constructing Autonomous Agents Driven by Large Language Models: Design, Implementation, and Applications in AgentForge [1.932555230783329]
Lightweight, open-source Python framework designed to democratize the construction of LLM-driven autonomous agents.<n>AgentForge introduces three key innovations: (1) a composable skill abstraction that enables fine-grained task decomposition with formally defined input-output contracts, (2) a unified backend interface supporting seamless switching between cloud-based APIs and local inference engines, and (3) a declarative YAML-based configuration system that separates agent logic from implementation details.
arXiv Detail & Related papers (2026-01-19T20:33:26Z)
The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check [54.08619694620588]
We present a comprehensive evaluation of dLLMs across two distinct agentic paradigms: Embodied Agents and Tool-Calling Agents.<n>Our results on Agentboard and BFCL reveal a "bitter lesson": current dLLMs fail to serve as reliable agentic backbones.
arXiv Detail & Related papers (2026-01-19T11:45:39Z)
BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents [58.83028403414688]
Large language model (LLM) agents execute tasks through multi-step workflow that combine planning, memory, and tool use.<n>Backdoor triggers injected into specific stages of an agent workflow can persist through multiple intermediate states and adversely influence downstream outputs.<n>We propose textbfBackdoorAgent, a modular and stage-aware framework that provides a unified agent-centric view of backdoor threats in LLM agents.
arXiv Detail & Related papers (2026-01-08T03:49:39Z)
Towards Efficient Agents: A Co-Design of Inference Architecture and System [66.59916327634639]
This paper presents AgentInfer, a unified framework for end-to-end agent acceleration.<n>We decompose the problem into four synergistic components: AgentCollab, AgentSched, AgentSAM, and AgentCompress.<n>Experiments on the BrowseComp-zh and DeepDiver benchmarks demonstrate that through the synergistic collaboration of these methods, AgentInfer reduces ineffective token consumption by over 50%.
arXiv Detail & Related papers (2025-12-20T12:06:13Z)
DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems [48.971606069204825]
DoVer is an intervention-driven debug framework for large language model (LLM)-based multi-agent systems.<n>It augments hypothesis generation with active verification through targeted interventions.<n>DoVer flips 18-28% of failed trials into successes, achieves up to 16% milestone progress, and validates or refutes 30-60% of failure hypotheses.
arXiv Detail & Related papers (2025-12-07T09:23:48Z)
Extracting Events Like Code: A Multi-Agent Programming Framework for Zero-Shot Event Extraction [21.08753833036094]
We present Agent-Event-Coder (AEC), a novel multi-agent framework that treats event extraction like software engineering.<n>AEC decomposes ZSEE into specialized subtasks--retrieval, planning, coding, and verification--each handled by a dedicated LLM agent.<n> Experiments across five diverse domains and six LLMs demonstrate that AEC consistently outperforms prior zero-shot baselines.
arXiv Detail & Related papers (2025-11-17T08:17:15Z)
Agent4FaceForgery: Multi-Agent LLM Framework for Realistic Face Forgery Detection [108.5042835056188]
This work introduces Agent4FaceForgery to address two fundamental problems.<n>How to capture the diverse intents and iterative processes of human forgery creation.<n>How to model the complex, often adversarial, text-image interactions that accompany forgeries in social media.
arXiv Detail & Related papers (2025-09-16T01:05:01Z)
Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling [83.78874399606379]
We propose MACT, a Multi-Agent Collaboration framework with Test-Time scaling.<n>It comprises four distinct small-scale agents, with clearly defined roles and effective collaboration.<n>It shows superior performance with a smaller parameter scale without sacrificing the ability of general and mathematical tasks.
arXiv Detail & Related papers (2025-08-05T12:52:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.