A Cloud-based Multi-Agentic Workflow for Science
- URL: http://arxiv.org/abs/2601.12607v1
- Date: Sun, 18 Jan 2026 22:37:09 GMT
- Title: A Cloud-based Multi-Agentic Workflow for Science
- Authors: Anurag Acharya, Timothy Vega, Rizwan A. Ashraf, Anshu Sharma, Derek Parker, Robert Rallo,
- Abstract summary: Large Language Models (LLMs) become ubiquitous across various scientific domains.<n>Their lack of ability to perform complex tasks like running simulations or to make complex decisions limits their utility.<n>We present a domain-agnostic, model-independent workflow for an agentic framework that can act as a scientific assistant while being run entirely on cloud.
- Score: 0.12314765641075438
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As Large Language Models (LLMs) become ubiquitous across various scientific domains, their lack of ability to perform complex tasks like running simulations or to make complex decisions limits their utility. LLM-based agents bridge this gap due to their ability to call external resources and tools and thus are now rapidly gaining popularity. However, coming up with a workflow that can balance the models, cloud providers, and external resources is very challenging, making implementing an agentic system more of a hindrance than a help. In this work, we present a domain-agnostic, model-independent workflow for an agentic framework that can act as a scientific assistant while being run entirely on cloud. Built with a supervisor agent marshaling an array of agents with individual capabilities, our framework brings together straightforward tasks like literature review and data analysis with more complex ones like simulation runs. We describe the framework here in full, including a proof-of-concept system we built to accelerate the study of Catalysts, which is highly important in the field of Chemistry and Material Science. We report the cost to operate and use this framework, including the breakdown of the cost by services use. We also evaluate our system on a custom-curated synthetic benchmark and a popular Chemistry benchmark, and also perform expert validation of the system. The results show that our system is able to route the task to the correct agent 90% of the time and successfully complete the assigned task 97.5% of the time for the synthetic tasks and 91% of the time for real-world tasks, while still achieving better or comparable accuracy to most frontier models, showing that this is a viable framework for other scientific domains to replicate.
Related papers
- El Agente Gráfico: Structured Execution Graphs for Scientific Agents [7.47895130442454]
We present El Agente Grfico, a single-agent framework that embeds large language models (LLMs)-driven decision-making within a type-safe execution environment.<n>Central to our approach is a structured abstraction of scientific concepts and an object-graph mapper that represents computational state as typed Python objects.<n>We evaluate the system by developing an automated benchmarking framework across a suite of university-level quantum chemistry tasks.
arXiv Detail & Related papers (2026-02-19T23:47:05Z) - Bohrium + SciMaster: Building the Infrastructure and Ecosystem for Agentic Science at Scale [82.20980951765891]
We argue that scaling agentic science requires an infrastructure-and-ecosystem approach, instantiated Bohrium+SciMaster.<n>Bohrium acts as a managed, traceable hub for AI4S assets that turns diverse scientific data, software, compute, and laboratory systems into agent-ready capabilities.<n>SciMaster orchestrates these capabilities into long-horizon scientific, on which scientific agents can be composed and executed.
arXiv Detail & Related papers (2025-12-23T16:04:41Z) - An Agentic Framework for Autonomous Materials Computation [70.24472585135929]
Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery.<n>Recent advances integrate LLMs into agentic frameworks, enabling retrieval, reasoning, and tool use for complex scientific experiments.<n>Here, we present a domain-specialized agent designed for reliable automation of first-principles materials computations.
arXiv Detail & Related papers (2025-12-22T15:03:57Z) - Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization [69.36509281190662]
Adapting production-level computer vision tools to bespoke scientific datasets is a critical "last mile" bottleneck.<n>We consider using AI agents to automate this manual coding, and focus on the open question of optimal agent design.<n>We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions.
arXiv Detail & Related papers (2025-12-02T18:42:26Z) - LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering [90.84806758077536]
We introduce textbfLoCoBench-Agent, a comprehensive evaluation framework specifically designed to assess large language models (LLMs) agents in realistic, long-context software engineering.<n>Our framework extends LoCoBench's 8,000 scenarios into interactive agent environments, enabling systematic evaluation of multi-turn conversations.<n>Our framework provides agents with 8 specialized tools (file operations, search, code analysis) and evaluates them across context lengths ranging from 10K to 1M tokens.
arXiv Detail & Related papers (2025-11-17T23:57:24Z) - ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows [82.07367406991678]
Large Language Models (LLMs) have extended their impact beyond Natural Language Processing.<n>Among these, computer-using agents are capable of interacting with operating systems as humans do.<n>We introduce ScienceBoard, which encompasses a realistic, multi-domain environment featuring dynamic and visually rich scientific software.
arXiv Detail & Related papers (2025-05-26T12:27:27Z) - El Agente: An Autonomous Agent for Quantum Chemistry [3.6593051631801106]
El Agente Q is a multi-agent system that generates and executes quantum chemistry from natural language user prompts.<n>El Agente Q is benchmarked on six university-level course exercises and two case studies, demonstrating robust problem-solving performance.
arXiv Detail & Related papers (2025-05-05T09:07:22Z) - WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks [85.95607119635102]
Large language models (LLMs) can mimic human-like intelligence.<n>WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents.
arXiv Detail & Related papers (2024-07-07T07:15:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.