Autonomous Microscopy Experiments through Large Language Model Agents
- URL: http://arxiv.org/abs/2501.10385v2
- Date: Mon, 07 Jul 2025 13:21:44 GMT
- Title: Autonomous Microscopy Experiments through Large Language Model Agents
- Authors: Indrajeet Mandal, Jitendra Soni, Mohd Zaki, Morten M. Smedskjaer, Katrin Wondraczek, Lothar Wondraczek, Nitya Nand Gosvami, N. M. Anoop Krishnan,
- Abstract summary: Large language models (LLMs) are revolutionizing self driving laboratories (SDLs) for materials research.<n>We introduce Artificially Intelligent Lab Assistant (AILA), a framework automating atomic force microscopy through LLM driven agents.<n>We find that state of the art models struggle with basic tasks and coordination scenarios.
- Score: 4.241267255764773
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are revolutionizing self driving laboratories (SDLs) for materials research, promising unprecedented acceleration of scientific discovery. However, current SDL implementations rely on rigid protocols that fail to capture the adaptability and intuition of expert scientists in dynamic experimental settings. We introduce Artificially Intelligent Lab Assistant (AILA), a framework automating atomic force microscopy through LLM driven agents. Further, we develop AFMBench a comprehensive evaluation suite challenging AI agents across the complete scientific workflow from experimental design to results analysis. We find that state of the art models struggle with basic tasks and coordination scenarios. Notably, Claude 3.5 sonnet performs unexpectedly poorly despite excelling in materials domain question answering (QA) benchmarks, revealing that domain specific QA proficiency does not necessarily translate to effective agentic capabilities. Additionally, we observe that LLMs can deviate from instructions, raising safety alignment concerns for SDL applications. Our ablations reveal that multi agent frameworks outperform single-agent architectures. We also observe significant prompt fragility, where slight modifications in prompt structure cause substantial performance variations in capable models like GPT 4o. Finally, we evaluate AILA's effectiveness in increasingly advanced experiments AFM calibration, feature detection, mechanical property measurement, graphene layer counting, and indenter detection. Our findings underscore the necessity for rigorous benchmarking protocols and prompt engineering strategies before deploying AI laboratory assistants in scientific research environments.
Related papers
- ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z) - Dynamic Knowledge Exchange and Dual-diversity Review: Concisely Unleashing the Potential of a Multi-Agent Research Team [53.38438460574943]
IDVSCI is a multi-agent framework built on large language models (LLMs)<n>It incorporates two key innovations: a Dynamic Knowledge Exchange mechanism and a Dual-Diversity Review paradigm.<n>Results show that IDVSCI consistently achieves the best performance across two datasets.
arXiv Detail & Related papers (2025-06-23T07:12:08Z) - ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows [82.07367406991678]
Large Language Models (LLMs) have extended their impact beyond Natural Language Processing.<n>Among these, computer-using agents are capable of interacting with operating systems as humans do.<n>We introduce ScienceBoard, which encompasses a realistic, multi-domain environment featuring dynamic and visually rich scientific software.
arXiv Detail & Related papers (2025-05-26T12:27:27Z) - Can AI Agents Design and Implement Drug Discovery Pipelines? [1.5848629658789695]
Current AI agent-based systems demonstrate proficiency in solving programming challenges and conducting research.
This paper introduces DO Challenge, a benchmark designed to evaluate the decision-making abilities of AI agents.
We present the Deep Thought multi-agent system, which demonstrated strong performance on the benchmark, outperforming most human teams.
arXiv Detail & Related papers (2025-04-28T15:41:28Z) - LLM Agent Swarm for Hypothesis-Driven Drug Discovery [2.7036595757881323]
PharmaSwarm is a unified multi-agent framework that orchestrates specialized "agents" to propose, validate, and refine hypotheses for novel drug targets and lead compounds.<n>By acting as an AI copilot, PharmaSwarm can accelerate translational research and deliver high-confidence hypotheses more efficiently than traditional pipelines.
arXiv Detail & Related papers (2025-04-24T22:27:50Z) - Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents [21.001278669360346]
We propose Curie, an AI framework designed to embed rigor into the experimentation process.<n>Curie includes an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability.<n>Compared to the strongest baseline tested, we achieve a 3.4$times$ improvement in correctly answering experimental questions.
arXiv Detail & Related papers (2025-02-22T03:58:19Z) - MLGym: A New Framework and Benchmark for Advancing AI Research Agents [51.9387884953294]
We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing large language models on AI research tasks.<n>This is the first Gym environment for machine learning (ML) tasks, enabling research on reinforcement learning (RL) algorithms for training such agents.<n>We evaluate a number of frontier large language models (LLMs) on our benchmarks such as Claude-3.5-Sonnet, Llama-3.1 405B, GPT-4o, o1-preview, and Gemini-1.5 Pro.
arXiv Detail & Related papers (2025-02-20T12:28:23Z) - DropMicroFluidAgents (DMFAs): Autonomous Droplet Microfluidic Research Framework Through Large Language Model Agents [0.6827423171182153]
This study demonstrates the effective use of Large language models (LLMs) in droplet microfluidics research.<n>The integration of DMFAs with the LLAMA3.1 model yielded the highest accuracy of 76.15%.<n>These capabilities enable their application across education and industrial support, driving greater efficiency in scientific discovery and innovation.
arXiv Detail & Related papers (2024-12-30T11:58:52Z) - Can Modern LLMs Act as Agent Cores in Radiology Environments? [54.36730060680139]
Large language models (LLMs) offer enhanced accuracy and interpretability across various domains.<n>This paper aims to investigate the pre-requisite question for building concrete radiology agents.<n>We present RadABench-Data, a comprehensive synthetic evaluation dataset for LLM-based agents.<n>Second, we propose RadABench-EvalPlat, a novel evaluation platform for agents featuring a prompt-driven workflow.
arXiv Detail & Related papers (2024-12-12T18:20:16Z) - Agents for self-driving laboratories applied to quantum computing [2.840384720502993]
This paper introduces the k-agents framework, designed to support experimentalists in organizing laboratory knowledge and automating experiments with agents.<n>Our framework employs large language model-based agents to encapsulate laboratory knowledge including available laboratory operations and methods for analyzing experiment results.<n>To automate experiments, we introduce execution agents that break multi-step experimental procedures into state machines, interact with other agents to execute each step and analyze the experiment results.
arXiv Detail & Related papers (2024-12-10T23:30:44Z) - AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - ML Research Benchmark [0.0]
We present the ML Research Benchmark (MLRB), comprising 7 competition-level tasks derived from recent machine learning conference tracks.
This paper introduces a novel benchmark and evaluates it using agent scaffolds powered by frontier models, including Claude-3 and GPT-4o.
The results indicate that the Claude-3.5 Sonnet agent performs best across our benchmark, excelling in planning and developing machine learning models.
arXiv Detail & Related papers (2024-10-29T21:38:42Z) - Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance [95.03771007780976]
We tackle the challenge of developing proactive agents capable of anticipating and initiating tasks without explicit human instructions.
First, we collect real-world human activities to generate proactive task predictions.
These predictions are labeled by human annotators as either accepted or rejected.
The labeled data is used to train a reward model that simulates human judgment.
arXiv Detail & Related papers (2024-10-16T08:24:09Z) - MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents [10.86017322488788]
We present a new systematic framework, autonomous Machine Learning Research with large language models (MLR-Copilot)
It is designed to enhance machine learning research productivity through the automatic generation and implementation of research ideas using Large Language Model (LLM) agents.
We evaluate our framework on five machine learning research tasks and the experimental results show the framework's potential to facilitate the research progress and innovations.
arXiv Detail & Related papers (2024-08-26T05:55:48Z) - A Review of Large Language Models and Autonomous Agents in Chemistry [0.7184549921674758]
Large language models (LLMs) have emerged as powerful tools in chemistry.
This review highlights LLM capabilities in chemistry and their potential to accelerate scientific discovery through automation.
As agents are an emerging topic, we extend the scope of our review of agents beyond chemistry.
arXiv Detail & Related papers (2024-06-26T17:33:21Z) - DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents [49.74065769505137]
We introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's ability to perform complete cycles of novel scientific discovery.
It includes 120 different challenge tasks spanning eight topics each with three levels of difficulty and several parametric variations.
We find that strong baseline agents, that perform well in prior published environments, struggle on most DISCOVERYWORLD tasks.
arXiv Detail & Related papers (2024-06-10T20:08:44Z) - AlabOS: A Python-based Reconfigurable Workflow Management Framework for Autonomous Laboratories [3.8330070166920556]
We introduce AlabOS, a general-purpose software framework for orchestrating experiments and managing resources.
AlabOS features a reconfigurable experiment workflow model and a resource reservation mechanism, enabling the simultaneous execution of varied tasks.
We demonstrate the implementation of AlabOS in a prototype autonomous materials laboratory, A-Lab, with around 3,500 samples synthesized over 1.5 years.
arXiv Detail & Related papers (2024-05-22T18:59:39Z) - CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments [51.41735920759667]
Large Language Models (LLMs) have shown promise in various tasks, but they often lack specific knowledge and struggle to accurately solve biological design problems.
In this work, we introduce CRISPR-GPT, an LLM agent augmented with domain knowledge and external tools to automate and enhance the design process of CRISPR-based gene-editing experiments.
arXiv Detail & Related papers (2024-04-27T22:59:17Z) - MLXP: A Framework for Conducting Replicable Experiments in Python [63.37350735954699]
We propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python.
It streamlines the experimental process with minimal overhead while ensuring a high level of practitioner overhead.
arXiv Detail & Related papers (2024-02-21T14:22:20Z) - Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [55.30328162764292]
Chemist-X is a comprehensive AI agent that automates the reaction condition optimization (RCO) task in chemical synthesis.
The agent uses retrieval-augmented generation (RAG) technology and AI-controlled wet-lab experiment executions.
Results of our automatic wet-lab experiments, achieved by fully LLM-supervised end-to-end operation with no human in the lope, prove Chemist-X's ability in self-driving laboratories.
arXiv Detail & Related papers (2023-11-16T01:21:33Z) - A Comprehensive Performance Study of Large Language Models on Novel AI
Accelerators [2.88634411143577]
Large language models (LLMs) are being considered as a promising approach to address some of the challenging problems.
Specialized AI accelerator hardware systems have recently become available for accelerating AI applications.
arXiv Detail & Related papers (2023-10-06T21:55:57Z) - MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation [96.71370747681078]
We introduce MLAgentBench, a suite of 13 tasks ranging from improving model performance on CIFAR-10 to recent research problems like BabyLM.
For each task, an agent can perform actions like reading/writing files, executing code, and inspecting outputs.
We benchmark agents based on Claude v1.0, Claude v2.1, Claude v3 Opus, GPT-4, GPT-4-turbo, Gemini-Pro, and Mixtral and find that a Claude v3 Opus agent is the best in terms of success rate.
arXiv Detail & Related papers (2023-10-05T04:06:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.