El Agente Gráfico: Structured Execution Graphs for Scientific Agents
- URL: http://arxiv.org/abs/2602.17902v1
- Date: Thu, 19 Feb 2026 23:47:05 GMT
- Title: El Agente Gráfico: Structured Execution Graphs for Scientific Agents
- Authors: Jiaru Bai, Abdulrahman Aldossary, Thomas Swanick, Marcel Müller, Yeonghun Kang, Zijian Zhang, Jin Won Lee, Tsz Wai Ko, Mohammad Ghazi Vakili, Varinia Bernales, Alán Aspuru-Guzik,
- Abstract summary: We present El Agente Grfico, a single-agent framework that embeds large language models (LLMs)-driven decision-making within a type-safe execution environment.<n>Central to our approach is a structured abstraction of scientific concepts and an object-graph mapper that represents computational state as typed Python objects.<n>We evaluate the system by developing an automated benchmarking framework across a suite of university-level quantum chemistry tasks.
- Score: 7.47895130442454
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are increasingly used to automate scientific workflows, yet their integration with heterogeneous computational tools remains ad hoc and fragile. Current agentic approaches often rely on unstructured text to manage context and coordinate execution, generating often overwhelming volumes of information that may obscure decision provenance and hinder auditability. In this work, we present El Agente Gráfico, a single-agent framework that embeds LLM-driven decision-making within a type-safe execution environment and dynamic knowledge graphs for external persistence. Central to our approach is a structured abstraction of scientific concepts and an object-graph mapper that represents computational state as typed Python objects, stored either in memory or persisted in an external knowledge graph. This design enables context management through typed symbolic identifiers rather than raw text, thereby ensuring consistency, supporting provenance tracking, and enabling efficient tool orchestration. We evaluate the system by developing an automated benchmarking framework across a suite of university-level quantum chemistry tasks previously evaluated on a multi-agent system, demonstrating that a single agent, when coupled to a reliable execution engine, can robustly perform complex, multi-step, and parallel computations. We further extend this paradigm to two other large classes of applications: conformer ensemble generation and metal-organic framework design, where knowledge graphs serve as both memory and reasoning substrates. Together, these results illustrate how abstraction and type safety can provide a scalable foundation for agentic scientific automation beyond prompt-centric designs.
Related papers
- Agentics 2.0: Logical Transduction Algebra for Agentic Data Workflows [3.0955233217110045]
We present Agentics 2.0, a lightweight, Python-native framework for building high-quality, structured, explainable, and type-safe agentic data.<n>At the core of Agentics 2.0, the logical algebra formalizes a large language model inference call as a typed semantic transformation.<n>The proposed framework provides semantic reliability through strong typing, semantic observability, and evidence tracing.
arXiv Detail & Related papers (2026-03-04T16:30:01Z) - The Auton Agentic AI Framework [5.410458076724158]
The field of Artificial Intelligence is undergoing a transition from Generative AI to Agentic AI.<n>This transition exposes a fundamental architectural mismatch: Large Language Models (LLMs) produce unstructured outputs, whereas the backend infrastructure they must control requires deterministic, schema-conformant inputs.<n>The present paper describes the Auton Agentic AI Framework, a principled architecture for the creation, creation, and governance of autonomous agent.
arXiv Detail & Related papers (2026-02-27T06:42:08Z) - Monadic Context Engineering [59.95390010097654]
This paper introduces Monadic Context Engineering (MCE) to provide a formal foundation for agent design.<n>We demonstrate how Monads enable robust composition, how Applicatives provide a principled structure for parallel execution, and crucially, how Monad Transformers allow for the systematic composition of these capabilities.<n>This layered approach enables developers to construct complex, resilient, and efficient AI agents from simple, independently verifiable components.
arXiv Detail & Related papers (2025-12-27T01:52:06Z) - Synthesizing Procedural Memory: Challenges and Architectures in Automated Workflow Generation [0.5599792629509229]
This paper operationalizes the transition of Large Language Models from passive tool-users to active workflow architects.<n>We demonstrate that by enforcing a scientific methodology of hypothesize, probe, and code, agents can autonomously write robust, production-grade code skills.
arXiv Detail & Related papers (2025-12-23T11:33:32Z) - An Agentic Framework for Autonomous Materials Computation [70.24472585135929]
Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery.<n>Recent advances integrate LLMs into agentic frameworks, enabling retrieval, reasoning, and tool use for complex scientific experiments.<n>Here, we present a domain-specialized agent designed for reliable automation of first-principles materials computations.
arXiv Detail & Related papers (2025-12-22T15:03:57Z) - Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection [59.04089915447622]
ForenAgent is an interactive IFD framework that enables MLLMs to autonomously generate, execute, and refine Python-based low-level tools around the detection objective.<n>Inspired by human reasoning, we design a dynamic reasoning loop comprising global perception, local focusing, iterative probing, and holistic adjudication.<n>Experiments show that ForenAgent exhibits emergent tool-use competence and reflective reasoning on challenging IFD tasks.
arXiv Detail & Related papers (2025-12-18T08:38:44Z) - Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization [69.36509281190662]
Adapting production-level computer vision tools to bespoke scientific datasets is a critical "last mile" bottleneck.<n>We consider using AI agents to automate this manual coding, and focus on the open question of optimal agent design.<n>We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions.
arXiv Detail & Related papers (2025-12-02T18:42:26Z) - LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering [90.84806758077536]
We introduce textbfLoCoBench-Agent, a comprehensive evaluation framework specifically designed to assess large language models (LLMs) agents in realistic, long-context software engineering.<n>Our framework extends LoCoBench's 8,000 scenarios into interactive agent environments, enabling systematic evaluation of multi-turn conversations.<n>Our framework provides agents with 8 specialized tools (file operations, search, code analysis) and evaluates them across context lengths ranging from 10K to 1M tokens.
arXiv Detail & Related papers (2025-11-17T23:57:24Z) - PADME: Procedure Aware DynaMic Execution [7.8148770419284865]
We introduce Procedure Aware DynaMic Execution (PADME), an agent framework that produces and exploits a graph-based representation of procedures.<n>Unlike prior work that relies on manual graph construction or unstructured reasoning, PADME autonomously transforms procedural text into executable graphs.<n>PADME achieves state-of-the-art performance on four diverse benchmarks, including ALFWorld and ScienceWorld.
arXiv Detail & Related papers (2025-10-13T11:15:49Z) - AgentRouter: A Knowledge-Graph-Guided LLM Router for Collaborative Multi-Agent Question Answering [51.07491603393163]
tAgent is a framework that formulates multi-agent QA as a knowledge-graph-guided routing problem supervised by empirical performance signals.<n>By leveraging soft supervision and weighted aggregation of agent outputs, Agent learns principled collaboration schemes that capture the complementary strengths of diverse agents.
arXiv Detail & Related papers (2025-10-06T23:20:49Z) - State and Memory is All You Need for Robust and Reliable AI Agents [29.259008600842517]
Large language models (LLMs) have enabled powerful advances in natural language understanding and generation.<n>Yet their application to complex, real-world scientific remain limited by challenges in memory, planning, and tool integration.<n>Here, we introduce SciBORG, a modular agentic framework that allows LLM-based agents to autonomously plan, reason, and achieve robust and reliable domain-specific task execution.
arXiv Detail & Related papers (2025-06-30T02:02:35Z) - Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories [17.975121612118752]
Large Language Model (LLM)-based agents are increasingly employed to automate complex software engineering tasks.<n>We present a large-scale empirical study of the thought-action-result trajectories of three state-of-the-art LLM-based agents.<n>We identify key trajectory characteristics, such as counts and token consumption, recurring action sequences, and the semantic coherence of thoughts, actions, and their results.
arXiv Detail & Related papers (2025-06-23T16:34:52Z) - Unifying Language Agent Algorithms with Graph-based Orchestration Engine for Reproducible Agent Research [32.92036657863354]
Language agents powered by large language models (LLMs) have demonstrated remarkable capabilities in understanding, reasoning, and executing complex tasks.<n>However, developing robust agents presents significant challenges: substantial engineering overhead, lack of standardized components, and insufficient evaluation frameworks for fair comparison.<n>We introduce Agent Graph-based Orchestration for Reasoning and Assessment (AGORA), a flexible and abstraction framework that addresses these challenges.
arXiv Detail & Related papers (2025-05-30T08:46:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.