Fugu-MT 論文翻訳(概要): Generalist Large Language Models for Molecular Property Prediction: Distilling Knowledge from Specialist Models

論文の概要: Generalist Large Language Models for Molecular Property Prediction: Distilling Knowledge from Specialist Models

arxiv url: http://arxiv.org/abs/2603.12344v1
Date: Thu, 12 Mar 2026 18:06:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-16 17:38:11.707429
Title: Generalist Large Language Models for Molecular Property Prediction: Distilling Knowledge from Specialist Models
Title（参考訳）: 分子特性予測のためのジェネラリスト大言語モデル:スペシャリストモデルからの知識の希薄化
Authors: Khiem Le, Sreejata Dey, Marcos Martínez Galindo, Vanessa Lopez, Ting Hua, Nitesh V. Chawla, Hoang Thanh Lam,
Abstract要約: 本研究では,木に基づく専門モデルから大規模言語モデルへ補完的な知識を伝達する知識蒸留手法であるTreeKDを提案する。提案手法は,機能的グループ機能に関する決定木を訓練し,学習した予測ルールを自然言語として言語化し,文脈学習を実現する。 TDCベンチマークによる22のADMET特性の実験により、TreeKDはLLMの性能を大幅に向上することが示された。
参考スコア（独自算出の注目度）: 28.15640069443448
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Molecular Property Prediction (MPP) is a central task in drug discovery. While Large Language Models (LLMs) show promise as generalist models for MPP, their current performance remains below the threshold for practical adoption. We propose TreeKD, a novel knowledge distillation method that transfers complementary knowledge from tree-based specialist models into LLMs. Our approach trains specialist decision trees on functional group features, then verbalizes their learned predictive rules as natural language to enable rule-augmented context learning. This enables LLMs to leverage structural insights that are difficult to extract from SMILES strings alone. We further introduce rule-consistency, a test-time scaling technique inspired by bagging that ensembles predictions across diverse rules from a Random Forest. Experiments on 22 ADMET properties from the TDC benchmark demonstrate that TreeKD substantially improves LLM performance, narrowing the gap with SOTA specialist models and advancing toward practical generalist models for molecular property prediction.
Abstract（参考訳）: 分子特性予測(MPP)は、薬物発見における中心的な課題である。 LLM(Large Language Models)はMPPのジェネラリストモデルとして有望であるが、現在のパフォーマンスは実用的採用のしきい値を下回っている。本研究では,木系スペシャリストモデルからLLMに相補的知識を伝達する新しい知識蒸留法であるTreeKDを提案する。提案手法は,機能的グループ機能に関する決定木を訓練し,学習した予測ルールを自然言語として言語化し,ルール強化コンテキスト学習を実現する。これにより、LSMはSMILES文字列からのみ抽出することが難しい構造的洞察を活用することができる。さらに、ランダムフォレストから様々なルールにまたがって予測をまとめる、バッグにインスパイアされたテストタイムスケーリング技術であるルール一貫性を導入します。 TDCベンチマークによる22のADMET特性の実験では、TreeKDはLLM性能を大幅に改善し、SOTA専門家モデルとのギャップを狭め、分子特性予測のための実用的な一般モデルに向けて進んでいることが示されている。

論文の概要: Generalist Large Language Models for Molecular Property Prediction: Distilling Knowledge from Specialist Models

関連論文リスト