Can AI Agents Design and Implement Drug Discovery Pipelines?
- URL: http://arxiv.org/abs/2504.19912v1
- Date: Mon, 28 Apr 2025 15:41:28 GMT
- Title: Can AI Agents Design and Implement Drug Discovery Pipelines?
- Authors: Khachik Smbatyan, Tsolak Ghukasyan, Tigran Aghajanyan, Hovhannes Dabaghyan, Sergey Adamyan, Aram Bughdaryan, Vahagn Altunyan, Gagik Navasardyan, Aram Davtyan, Anush Hakobyan, Aram Gharibyan, Arman Fahradyan, Artur Hakobyan, Hasmik Mnatsakanyan, Narek Ginoyan, Garik Petrosyan,
- Abstract summary: Current AI agent-based systems demonstrate proficiency in solving programming challenges and conducting research.<n>This paper introduces DO Challenge, a benchmark designed to evaluate the decision-making abilities of AI agents.<n>We present the Deep Thought multi-agent system, which demonstrated strong performance on the benchmark, outperforming most human teams.
- Score: 1.5848629658789695
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The rapid advancement of artificial intelligence, particularly autonomous agentic systems based on Large Language Models (LLMs), presents new opportunities to accelerate drug discovery by improving in-silico modeling and reducing dependence on costly experimental trials. Current AI agent-based systems demonstrate proficiency in solving programming challenges and conducting research, indicating an emerging potential to develop software capable of addressing complex problems such as pharmaceutical design and drug discovery. This paper introduces DO Challenge, a benchmark designed to evaluate the decision-making abilities of AI agents in a single, complex problem resembling virtual screening scenarios. The benchmark challenges systems to independently develop, implement, and execute efficient strategies for identifying promising molecular structures from extensive datasets, while navigating chemical space, selecting models, and managing limited resources in a multi-objective context. We also discuss insights from the DO Challenge 2025, a competition based on the proposed benchmark, which showcased diverse strategies explored by human participants. Furthermore, we present the Deep Thought multi-agent system, which demonstrated strong performance on the benchmark, outperforming most human teams. Among the language models tested, Claude 3.7 Sonnet, Gemini 2.5 Pro and o3 performed best in primary agent roles, and GPT-4o, Gemini 2.0 Flash were effective in auxiliary roles. While promising, the system's performance still fell short of expert-designed solutions and showed high instability, highlighting both the potential and current limitations of AI-driven methodologies in transforming drug discovery and broader scientific research.
Related papers
- Towards Agentic Recommender Systems in the Era of Multimodal Large Language Models [75.4890331763196]
Recent breakthroughs in Large Language Models (LLMs) have led to the emergence of agentic AI systems.
LLM-based Agentic RS (LLM-ARS) can offer more interactive, context-aware, and proactive recommendations.
arXiv Detail & Related papers (2025-03-20T22:37:15Z) - Comprehensive Review of Reinforcement Learning for Medical Ultrasound Imaging [15.38820664844588]
This work aims to offer a thorough review of current research efforts, highlighting the potential of RL in building autonomous US solutions.
Existing surveys on advancements in the US scanning domain predominantly focus on partially autonomous solutions leveraging AI.
To bridge this gap, this review proposes a comprehensive taxonomy that integrates the stages of the US process with the RL development pipeline.
arXiv Detail & Related papers (2025-03-19T03:22:27Z) - Autonomous Microscopy Experiments through Large Language Model Agents [4.241267255764773]
Large language models (LLMs) have accelerated the development of self-driving laboratories (SDLs) for materials research.<n>Here, we introduce AILA (Artificially Intelligent Lab Assistant), a framework that automates atomic force microscopy (AFM) through LLM-driven agents.<n>Our systematic assessment shows that state-of-the-art language models struggle even with basic tasks such as documentation retrieval.
arXiv Detail & Related papers (2024-12-18T09:35:28Z) - DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration [24.65716292347949]
DrugAgent is a multi-agent framework that automates machine learning (ML) programming for drug discovery tasks.<n>Our results show that DrugAgent consistently outperforms leading baselines.
arXiv Detail & Related papers (2024-11-24T03:06:59Z) - ML Research Benchmark [0.0]
We present the ML Research Benchmark (MLRB), comprising 7 competition-level tasks derived from recent machine learning conference tracks.
This paper introduces a novel benchmark and evaluates it using agent scaffolds powered by frontier models, including Claude-3 and GPT-4o.
The results indicate that the Claude-3.5 Sonnet agent performs best across our benchmark, excelling in planning and developing machine learning models.
arXiv Detail & Related papers (2024-10-29T21:38:42Z) - Agent-Oriented Planning in Multi-Agent Systems [54.429028104022066]
We propose AOP, a novel framework for agent-oriented planning in multi-agent systems.
In this study, we identify three critical design principles of agent-oriented planning, including solvability, completeness, and non-redundancy.
Extensive experiments demonstrate the advancement of AOP in solving real-world problems compared to both single-agent systems and existing planning strategies for multi-agent systems.
arXiv Detail & Related papers (2024-10-03T04:07:51Z) - Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation [49.27250832754313]
We present AgentCOT, a llm-based autonomous agent framework.
At each step, AgentCOT selects an action and executes it to yield an intermediate result with supporting evidence.
We introduce two new strategies to enhance the performance of AgentCOT.
arXiv Detail & Related papers (2024-09-19T02:20:06Z) - DrugAgent: Multi-Agent Large Language Model-Based Reasoning for Drug-Target Interaction Prediction [8.98329812378801]
DrugAgent is a multi-agent system for drug-target interaction prediction.<n>It combines multiple specialized perspectives with transparent reasoning.<n>Our approach provides detailed, human-interpretable reasoning for each prediction.
arXiv Detail & Related papers (2024-08-23T21:24:59Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents [49.74065769505137]
We introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's ability to perform complete cycles of novel scientific discovery.
It includes 120 different challenge tasks spanning eight topics each with three levels of difficulty and several parametric variations.
We find that strong baseline agents, that perform well in prior published environments, struggle on most DISCOVERYWORLD tasks.
arXiv Detail & Related papers (2024-06-10T20:08:44Z) - Generative AI Agent for Next-Generation MIMO Design: Fundamentals, Challenges, and Vision [76.4345564864002]
Next-generation multiple input multiple output (MIMO) is expected to be intelligent and scalable.
We propose the concept of the generative AI agent, which is capable of generating tailored and specialized contents.
We present two compelling case studies that demonstrate the effectiveness of leveraging the generative AI agent for performance analysis.
arXiv Detail & Related papers (2024-04-13T02:39:36Z) - Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [55.30328162764292]
Chemist-X is a comprehensive AI agent that automates the reaction condition optimization (RCO) task in chemical synthesis.<n>The agent uses retrieval-augmented generation (RAG) technology and AI-controlled wet-lab experiment executions.<n>Results of our automatic wet-lab experiments, achieved by fully LLM-supervised end-to-end operation with no human in the lope, prove Chemist-X's ability in self-driving laboratories.
arXiv Detail & Related papers (2023-11-16T01:21:33Z) - A Comprehensive Performance Study of Large Language Models on Novel AI
Accelerators [2.88634411143577]
Large language models (LLMs) are being considered as a promising approach to address some of the challenging problems.
Specialized AI accelerator hardware systems have recently become available for accelerating AI applications.
arXiv Detail & Related papers (2023-10-06T21:55:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.