Related papers: Towards Fully Autonomous Research Powered by LLMs: Case Study on Simulations

Related papers

AI-Enhanced Business Process Automation: A Case Study in the Insurance Domain Using Object-Centric Process Mining [0.7124736158080938]
This paper presents a case study from the insurance sector, where an LLM was deployed to automate the identification of claim parts. We apply Object-Centric Process Mining (OCPM) to assess the impact of AI-driven automation on process scalability. Our findings indicate that while LLMs significantly enhance operational capacity, they also introduce new process dynamics that require further refinement.
arXiv Detail & Related papers (2025-04-24T06:43:29Z)
Complex LLM Planning via Automated Heuristics Discovery [48.07520536415374]
We consider enhancing large language models (LLMs) for complex planning tasks. We propose automated inferences discovery (AutoHD), a novel approach that enables LLMs to explicitly generate functions to guide-time search. Our proposed method requires no additional model training or finetuning--and the explicit definition of functions generated by the LLMs provides interpretability and insights into the reasoning process.
arXiv Detail & Related papers (2025-02-26T16:52:31Z)
MDCrow: Automating Molecular Dynamics Workflows with Large Language Models [0.6130124744675498]
We introduce MDCrow, an agentic LLM assistant capable of automating Molecular dynamics simulations. We assess MDCrow's performance across 25 tasks of varying required subtasks and difficulty, and we evaluate the agent's robustness to both difficulty and prompt style.
arXiv Detail & Related papers (2025-02-13T18:19:20Z)
LLM-Agents Driven Automated Simulation Testing and Analysis of small Uncrewed Aerial Systems [11.183147511573717]
Thorough simulation testing is crucial for validating the correct behavior of small Uncrewed Aerial Systems. Various sUAS simulation tools exist to support developers, but the entire process of creating, executing, and analyzing simulation tests remains a largely manual and cumbersome task. We propose AutoSimTest, a framework where multiple LLM agents collaborate to support the sUAS simulation testing process.
arXiv Detail & Related papers (2025-01-21T03:42:21Z)
The Potential of LLMs in Automating Software Testing: From Generation to Reporting [0.0]
Manual testing, while effective, can be time consuming and costly, leading to an increased demand for automated methods. Recent advancements in Large Language Models (LLMs) have significantly influenced software engineering. This paper explores an agent-oriented approach to automated software testing, using LLMs to reduce human intervention and enhance testing efficiency.
arXiv Detail & Related papers (2024-12-31T02:06:46Z)
Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework [1.4255659581428337]
We propose a feedback-driven, multi-agent framework for managing simulations in power systems. This framework achieves success rates of 93.13% and 96.85%, respectively, on 69 diverse tasks from Daline and MATPOWER. It also supports rapid, cost-effective task execution, completing each simulation in approximately 30 seconds at an average cost of 0.014 USD for tokens.
arXiv Detail & Related papers (2024-11-21T19:01:07Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
CycleResearcher: Improving Automated Research via Automated Review [37.03497673861402]
This paper explores the possibility of using open-source post-trained large language models (LLMs) as autonomous agents capable of performing the full cycle of automated research and review. To train these models, we develop two new datasets, reflecting real-world machine learning research and peer review dynamics. In research, the papers generated by the CycleResearcher model achieved a score of 5.36 in simulated peer reviews, surpassing the preprint level of 5.24 from human experts and approaching the accepted paper level of 5.69.
arXiv Detail & Related papers (2024-10-28T08:10:21Z)
AutoFLUKA: A Large Language Model Based Framework for Automating Monte Carlo Simulations in FLUKA [6.571041942559539]
Monte Carlo (MC) simulations are essential for replicating real-world scenarios across scientific and engineering fields. Despite the robustness and versatility, FLUKA faces significant limitations in automation and integration with external post-processing tools. This study explores the potential of Large Language Models (LLMs) and AI agents to address these limitations. We introduce AutoFLUKA, an AI agent application developed using the LangChain Python Framework to automate typical MC simulation in FLUKA.
arXiv Detail & Related papers (2024-10-19T21:50:11Z)
MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents [10.86017322488788]
We present a new systematic framework, autonomous Machine Learning Research with large language models (MLR-Copilot) It is designed to enhance machine learning research productivity through the automatic generation and implementation of research ideas using Large Language Model (LLM) agents. We evaluate our framework on five machine learning research tasks and the experimental results show the framework's potential to facilitate the research progress and innovations.
arXiv Detail & Related papers (2024-08-26T05:55:48Z)
Enabling Large Language Models to Perform Power System Simulations with Previously Unseen Tools: A Case of Daline [1.4255659581428337]
This work proposes a modular framework that integrates expertise from both the power system and large language models. It improves GPT-4o's simulation coding accuracy from 0% to 96.07%, also outperforming the ChatGPT-4o web interface's 33.8% accuracy.
arXiv Detail & Related papers (2024-06-25T02:05:26Z)
Automatic benchmarking of large multimodal models via iterative experiment programming [71.78089106671581]
We present APEx, the first framework for automatic benchmarking of LMMs. Given a research question expressed in natural language, APEx leverages a large language model (LLM) and a library of pre-specified tools to generate a set of experiments for the model at hand. The report drives the testing procedure: based on the current status of the investigation, APEx chooses which experiments to perform and whether the results are sufficient to draw conclusions.
arXiv Detail & Related papers (2024-06-18T06:43:46Z)
Automating Research Synthesis with Domain-Specific Large Language Model Fine-Tuning [0.9110413356918055]
This research pioneers the use of fine-tuned Large Language Models (LLMs) to automate Systematic Literature Reviews ( SLRs) Our study employed the latest fine-tuning methodologies together with open-sourced LLMs, and demonstrated a practical and efficient approach to automating the final execution stages of an SLR process. The results maintained high fidelity in factual accuracy in LLM responses, and were validated through the replication of an existing PRISMA-conforming SLR.
arXiv Detail & Related papers (2024-04-08T00:08:29Z)
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics [51.17512229589]
PoLLMgraph is a model-based white-box detection and forecasting approach for large language models. We show that hallucination can be effectively detected by analyzing the LLM's internal state transition dynamics. Our work paves a new way for model-based white-box analysis of LLMs, motivating the research community to further explore, understand, and refine the intricate dynamics of LLM behaviors.
arXiv Detail & Related papers (2024-04-06T20:02:20Z)
Large Language Model-based Human-Agent Collaboration for Complex Task Solving [94.3914058341565]
We introduce the problem of Large Language Models (LLMs)-based human-agent collaboration for complex task-solving. We propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC. This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process.
arXiv Detail & Related papers (2024-02-20T11:03:36Z)
The Foundations of Computational Management: A Systematic Approach to Task Automation for the Integration of Artificial Intelligence into Existing Workflows [55.2480439325792]
This article introduces Computational Management, a systematic approach to task automation. The article offers three easy step-by-step procedures to begin the process of implementing AI within a workflow.
arXiv Detail & Related papers (2024-02-07T01:45:14Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)
Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [55.30328162764292]
Chemist-X is a comprehensive AI agent that automates the reaction condition optimization (RCO) task in chemical synthesis. The agent uses retrieval-augmented generation (RAG) technology and AI-controlled wet-lab experiment executions. Results of our automatic wet-lab experiments, achieved by fully LLM-supervised end-to-end operation with no human in the lope, prove Chemist-X's ability in self-driving laboratories.
arXiv Detail & Related papers (2023-11-16T01:21:33Z)
ProAgent: From Robotic Process Automation to Agentic Process Automation [87.0555252338361]
Large Language Models (LLMs) have emerged human-like intelligence. This paper introduces Agentic Process Automation (APA), a groundbreaking automation paradigm using LLM-based agents for advanced automation. We then instantiate ProAgent, an agent designed to craft from human instructions and make intricate decisions by coordinating specialized agents.
arXiv Detail & Related papers (2023-11-02T14:32:16Z)
A Survey on Large Language Model based Autonomous Agents [105.2509166861984]
Large language models (LLMs) have demonstrated remarkable potential in achieving human-level intelligence. This paper delivers a systematic review of the field of LLM-based autonomous agents from a holistic perspective. We present a comprehensive overview of the diverse applications of LLM-based autonomous agents in the fields of social science, natural science, and engineering.
arXiv Detail & Related papers (2023-08-22T13:30:37Z)
Closing the loop: Autonomous experiments enabled by machine-learning-based online data analysis in synchrotron beamline environments [80.49514665620008]
Machine learning can be used to enhance research involving large or rapidly generated datasets. In this study, we describe the incorporation of ML into a closed-loop workflow for X-ray reflectometry (XRR) We present solutions that provide an elementary data analysis in real time during the experiment without introducing the additional software dependencies in the beamline control software environment.
arXiv Detail & Related papers (2023-06-20T21:21:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.