Fugu-MT 論文翻訳(概要): SMDD-Bench: Can LLMs Solve Real-World Small Molecule Drug Design Tasks?

論文の概要: SMDD-Bench: Can LLMs Solve Real-World Small Molecule Drug Design Tasks?

arxiv url: http://arxiv.org/abs/2605.21740v2
Date: Sun, 24 May 2026 00:44:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 16:32:37.945256
Title: SMDD-Bench: Can LLMs Solve Real-World Small Molecule Drug Design Tasks?
Title（参考訳）: SMDD-Bench: LLMは現実世界の小さな分子のドラッグデザインタスクを解決できるのか?
Authors: Kevin Han, Renfei Zhang, Kathy Wei, Hamed Mahdavi, Niloofar Mireshghallah, Amir Barati Farimani,
Abstract要約: LLMエージェントは、科学的な発見の応用に驚くべき可能性を持っている。現在の評価手法はアドホックであり、現実世界の発見にはあまりにも単純すぎる。オープンでクローズドな7つの LLM をベンチマークし、最も高性能な LLM である GPT5.4 さえも、40.2% のタスクしか解決していないことを発見した。
参考スコア（独自算出の注目度）: 14.919492548107234
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLM agents have incredible potential for scientific discovery applications. However, the performance of LLM agents on real-world, small molecule drug design (SMDD) tasks across diverse chemistries and targets is unclear. Current evaluation methods are either ad hoc, too simple for real-world discovery, limited in scale, or restricted to single-turn question answering. In effort to standardize the evaluation of LLM agents on small molecule design, we introduce SMDD-Bench, a challenging, multi-turn, long-horizon agentic benchmark consisting of 502 guaranteed-solvable task instances spanning 5 task types: 2D Pharmacophore Identification, Interaction Point Discovery, Scaffold Hopping, Lead Optimization, and Fragment Assembly. SMDD-Bench tasks span a wide region of chemical space and involve 102 unique protein targets. Completely solving the benchmark would require having strong chemical and biological reasoning and 3D intuition, understanding specialized tool use, and displaying planning expertise over a limited number of oracle calls. We benchmark 7 frontier open and closed source LLMs and find even the most performant LLM, GPT5.4, solves only 40.2\% of tasks. We hope SMDD-Bench provides a standardized testbed to invigorate the field towards training and evaluating LLM agents for fully autonomous computational drug design. We host a public leaderboard at smddbench.com .
Abstract（参考訳）: LLMエージェントは、科学的な発見の応用に驚くべき可能性を持っている。しかし, LLM の薬剤の現実的, 小分子ドラッグデザイン (SMDD) における性能は, 種々の化学薬品や標的にまたがって明らかになっていない。現在の評価手法はアドホックであり、現実世界の発見にはあまりにも単純すぎる。小分子設計におけるLCMエージェントの評価を標準化するために、SMDD-Benchを紹介した。SMDD-Benchは、5種類のタスクタイプにまたがる502の保証可能なタスクインスタンスからなる、困難で多ターンの長いエージェントベンチマークである。 SMDD-Benchタスクは、幅広い化学領域にまたがり、102のユニークなタンパク質標的を含む。ベンチマークを完全に解決するには、強力な化学的および生物学的推論と3D直観、専門的なツールの使用の理解、限られた数のオラクルコールに関する計画的専門知識の表示が必要である。オープンでクローズドな7つの LLM をベンチマークし、最も高性能な LLM である GPT5.4 さえも、わずか 40.2 % のタスクしか解決できないことを発見した。 SMDD-Benchは、完全に自律的な薬物設計のためのLSMエージェントの訓練および評価の分野を活性化するための標準化されたテストベッドを提供することを期待している。私たちはsmddbench.comで公開のリーダーボードを開催しています。

論文の概要: SMDD-Bench: Can LLMs Solve Real-World Small Molecule Drug Design Tasks?

関連論文リスト