Related papers: GENIUS: An Agentic AI Framework for Autonomous Design and Execution of Simulation Protocols

GENIUS: An Agentic AI Framework for Autonomous Design and Execution of Simulation Protocols

URL: http://arxiv.org/abs/2512.06404v1
Date: Sat, 06 Dec 2025 11:28:35 GMT
Title: GENIUS: An Agentic AI Framework for Autonomous Design and Execution of Simulation Protocols
Authors: Mohammad Soleymanibrojeni, Roland Aydin, Diego Guedes-Sobrinho, Alexandre C. Dias, Maurício J. Piotrowski, Wolfgang Wenzel, Celso Ricardo Caldeira Rêgo,
Abstract summary: GENIUS is an AI-agentic workflow that fuses a smart Quantum ESPRESSO knowledge graph with a tiered hierarchy of large language models supervised by a finite-state error-recovery machine.<n> GENIUS translates free-form human-generated prompts into validated input files that run to completion on $approx$80% of 295 diverse benchmarks, where 76% are autonomously repaired, with success decaying exponentially to a 7% baseline.<n>The framework democratizes electronic-structure DFT simulations by intelligently automating protocol generation, validation, and repair, opening large-scale screening and accelerating ICME design loops across academia and industry worldwide.
Score: 32.505127447635864
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Predictive atomistic simulations have propelled materials discovery, yet routine setup and debugging still demand computer specialists. This know-how gap limits Integrated Computational Materials Engineering (ICME), where state-of-the-art codes exist but remain cumbersome for non-experts. We address this bottleneck with GENIUS, an AI-agentic workflow that fuses a smart Quantum ESPRESSO knowledge graph with a tiered hierarchy of large language models supervised by a finite-state error-recovery machine. Here we show that GENIUS translates free-form human-generated prompts into validated input files that run to completion on $\approx$80% of 295 diverse benchmarks, where 76% are autonomously repaired, with success decaying exponentially to a 7% baseline. Compared with LLM-only baselines, GENIUS halves inference costs and virtually eliminates hallucinations. The framework democratizes electronic-structure DFT simulations by intelligently automating protocol generation, validation, and repair, opening large-scale screening and accelerating ICME design loops across academia and industry worldwide.

Related papers

EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots [68.29056647487519]
Embodied AI is fueled by high-fidelity simulation and large-scale data collection.<n>However, this scaling capability remains bottlenecked by a reliance on labor-intensive manual oversight.<n>We introduce textscEmboCoach-Bench, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies.
arXiv Detail & Related papers (2026-01-29T11:33:49Z)
ComAgent: Multi-LLM based Agentic AI Empowered Intelligent Wireless Networks [62.031889234230725]
6G networks rely on complex cross-layer optimization.<n> manually translating high-level intents into mathematical formulations remains a bottleneck.<n>We present ComAgent, a multi-LLM agentic AI framework.
arXiv Detail & Related papers (2026-01-27T13:43:59Z)
AI-NativeBench: An Open-Source White-Box Agentic Benchmark Suite for AI-Native Systems [52.65695508605237]
We introduce AI-NativeBench, the first application-centric and white-box AI-Native benchmark suite grounded in Model Context Protocol (MCP) and Agent-to-Agent (A2A) standards.<n>By treating agentic spans as first-class citizens within distributed traces, our methodology enables granular analysis of engineering characteristics beyond simple capabilities.<n>This work provides the first systematic evidence to guide the transition from measuring model capability to engineering reliable AI-Native systems.
arXiv Detail & Related papers (2026-01-14T11:32:07Z)
SOFT: a high-performance simulator for universal fault-tolerant quantum circuits [5.744501987992456]
SOFT is a high-performance SimulatOr for universal Fault-Tolerant quantum circuits.<n>Our work demonstrates the importance of reliable simulation tools for fault-tolerant architecture design.
arXiv Detail & Related papers (2025-12-28T18:28:56Z)
ReX-MLE: The Autonomous Agent Benchmark for Medical Imaging Challenges [5.886200278450183]
We introduce ReX-MLE, a benchmark of 20 challenges derived from high-impact medical imaging competitions.<n>Unlike prior benchmarks, ReX-MLE evaluates full end-to-end, requiring agents to independently manage data preprocessing, model training, and submission.<n>We observe a severe performance gap: most submissions rank in the 0th percentile compared to human experts.
arXiv Detail & Related papers (2025-12-19T17:44:40Z)
ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms [4.235429894371577]
ATHENA is an agentic framework designed as an Autonomous Lab to manage the end-to-end computational research lifecycle.<n>Its core is the HENA loop, a knowledge-driven diagnostic process framed as a Contextual problem.<n>The framework achieves super-human performance, reaching validation errors of $10-14$.
arXiv Detail & Related papers (2025-12-03T06:05:27Z)
R&D-Agent: An LLM-Agent Framework Towards Autonomous Data Science [70.1638335489284]
High-level machine learning engineering tasks remain labor-intensive and iterative.<n>We introduce R&D-Agent, a comprehensive, decoupled, and framework that formalizes the machine learning process.<n>R&D-Agent defines the MLE into two phases and six components, turning agent design for MLE into a principled, testable process.
arXiv Detail & Related papers (2025-05-20T06:07:00Z)
DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery [54.79763887844838]
Large language models (LLMs) integrated with autonomous agents hold significant potential for advancing scientific discovery through automated reasoning and task execution.<n>We introduce DrugPilot, a LLM-based agent system with a parameterized reasoning architecture designed for end-to-end scientific in drug discovery.<n>DrugPilot significantly outperforms state-of-the-art agents such as ReAct and LoT, achieving task completion rates of 98.0%, 93.5%, and 64.0% for simple, multi-tool, and multi-turn scenarios, respectively.
arXiv Detail & Related papers (2025-05-20T05:18:15Z)
Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations [2.547250631115307]
Aitomia is a platform powered by AI to assist in performing AI-driven atomistic and quantum chemical (QC) simulations.<n>It is equipped with chatbots and AI agents to help experts and guide non-experts in setting up and running atomistic simulations.<n>Aitomia is expected to lower the barrier to performing atomistic simulations, thereby democratizing simulations and accelerating research and development in relevant fields.
arXiv Detail & Related papers (2025-05-13T03:11:41Z)
OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problems with Reasoning LLM [15.260794368585692]
We propose OR-LLM-Agent, an AI agent framework built on reasoning LLMs for automated Operations Research problem solving.<n>We show that OR-LLM-Agent utilizing DeepSeek-R1 in its framework outperforms advanced methods, including GPT-o3, Gemini 2.5 Pro, DeepSeek-R1, and ORLM, by at least 7% in accuracy.
arXiv Detail & Related papers (2025-03-13T03:40:50Z)
Convolutional vs Large Language Models for Software Log Classification in Edge-Deployable Cellular Network Testing [0.0]
We propose a compact convolutional neural network (CNN) architecture that offers a context window spanning up to 200,000 characters. Our model is capable of identifying defects in test runs and triaging them to the relevant department, formerly a manual engineering process. Our model is deployable on edge devices without dedicated hardware and widely applicable across software logs in various industries.
arXiv Detail & Related papers (2024-07-04T09:12:08Z)
Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives. We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis. We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z)
Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [55.30328162764292]
Chemist-X is a comprehensive AI agent that automates the reaction condition optimization (RCO) task in chemical synthesis.<n>The agent uses retrieval-augmented generation (RAG) technology and AI-controlled wet-lab experiment executions.<n>Results of our automatic wet-lab experiments, achieved by fully LLM-supervised end-to-end operation with no human in the lope, prove Chemist-X's ability in self-driving laboratories.
arXiv Detail & Related papers (2023-11-16T01:21:33Z)
The Basis of Design Tools for Quantum Computing: Arrays, Decision Diagrams, Tensor Networks, and ZX-Calculus [55.58528469973086]
Quantum computers promise to efficiently solve important problems classical computers never will. A fully automated quantum software stack needs to be developed. This work provides a look "under the hood" of today's tools and showcases how these means are utilized in them, e.g., for simulation, compilation, and verification of quantum circuits.
arXiv Detail & Related papers (2023-01-10T19:00:00Z)
TELESTO: A Graph Neural Network Model for Anomaly Classification in Cloud Services [77.454688257702]
Machine learning (ML) and artificial intelligence (AI) are applied on IT system operation and maintenance. One direction aims at the recognition of re-occurring anomaly types to enable remediation automation. We propose a method that is invariant to dimensionality changes of given data.
arXiv Detail & Related papers (2021-02-25T14:24:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.