Fugu-MT 論文翻訳(概要): Can LLMs Predict Polymer Physics Just by Reading Synthesis and Processing Prose?

論文の概要: Can LLMs Predict Polymer Physics Just by Reading Synthesis and Processing Prose?

arxiv url: http://arxiv.org/abs/2605.08255v1
Date: Thu, 07 May 2026 19:39:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:49.504978
Title: Can LLMs Predict Polymer Physics Just by Reading Synthesis and Processing Prose?
Title（参考訳）: LLMは、合成とプロセッシングのプロセスで、高分子物理を予測できるのか?
Authors: Yuchu Liu, Rui Zhu, Jingwei Xiong, Haixu Tang,
Abstract要約: textbfPolyLMは、フルテキスト文学から直接材料性能を予測する自然言語のみのフレームワークである。我々は、22の物理的、機械的、熱的性質にわたる185,000の科学論文と276,400以上のユニークなポリマーサンプルをキュレートした。このモデルは驚くほど高い予測精度を達成し、複雑な特性に対する新しい最先端のベンチマークを確立する。
参考スコア（独自算出の注目度）: 6.991343316028922
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Can large language models predict physical and mechanical polymer properties simply by reading unstructured scientific prose? Polymer performance is rarely determined by chemical structure alone; identical nominal polymers can exhibit drastically different behaviors depending on their synthesis route, processing history, morphology, and testing conditions. Yet, state-of-the-art polymer property models typically rely on structure-only representations -- such as SMILES or molecular graphs -- which strip away this vital experimental context. In this work, we introduce \textbf{PolyLM}, a natural-language-only, process- and condition-aware framework that predicts materials performance directly from full-text literature. By circumventing structural inputs entirely, PolyLM preserves the nuanced, unstructured descriptions of synthesis and processing reported by domain scientists. To train this framework, we curated an unprecedented, literature-scale dataset encompassing 185,000 scientific papers and over 276,400 unique polymer samples across 22 physical, mechanical, and thermal properties. We fine-tuned a massive 9-billion-parameter language model (Qwen3.5-9B) using Low-Rank Adaptation (LoRA) and task-level uncertainty weighting. Evaluated on 68,283 held-out observations, the model achieves remarkably high predictive accuracy, establishing new state-of-the-art benchmarks for complex properties. Across the 22 diverse targets, the model achieves a median $R^2$ of 0.74, with predictions for key thermal, mechanical, and physicochemical properties frequently surpassing an $R^2$ of 0.80. These results unequivocally demonstrate that natural language is a powerful, highly scalable interface for realistic materials performance prediction.
Abstract（参考訳）: 大規模言語モデルは、構造化されていない科学的散文を読み取るだけで、物理的および機械的高分子特性を予測できるのか? 同一の特異なポリマーは、合成経路、処理履歴、形態、試験条件によって大きく異なる挙動を示す。しかし、最先端のポリマー特性モデルは通常、構造のみの表現(SMILESや分子グラフなど)に依存しているため、この重要な実験的な文脈は取り除かれる。本研究では, 自然言語のみの, プロセス対応, 条件対応のフレームワークである \textbf{PolyLM} を紹介する。構造的入力を完全に回避することで、PolyLMは、ドメイン科学者によって報告された合成と処理の微妙で非構造的な記述を保存する。この枠組みをトレーニングするために、我々は185,000の科学論文と22の物理的・機械的・熱的性質にまたがる276,400以上のユニークなポリマーサンプルを含む、前例のない、文献規模のデータセットをキュレートした。低ランク適応(LoRA)とタスクレベルの不確実性重み付けを用いて,大規模9ビリオンパラメータ言語モデル(Qwen3.5-9B)を微調整した。 68,283個の観測結果に基づいて、このモデルは驚くほど高い予測精度を達成し、複雑な特性に対する新しい最先端のベンチマークを確立する。 22の多様な目標に対して、このモデルは中央値のR^2$の0.74を達成し、鍵となる熱、機械、物理化学的特性の予測は0.80のR^2$をしばしば上回る。これらの結果は、自然言語が現実的な材料性能予測のための強力でスケーラブルなインターフェースであることを明確に示している。

論文の概要: Can LLMs Predict Polymer Physics Just by Reading Synthesis and Processing Prose?

関連論文リスト