Fugu-MT 論文翻訳(概要): It's All Connected: Topology-Aware Structural Graph Encoding Improves Performance on Polymer Prediction

論文の概要: It's All Connected: Topology-Aware Structural Graph Encoding Improves Performance on Polymer Prediction

arxiv url: http://arxiv.org/abs/2605.10551v1
Date: Mon, 11 May 2026 13:28:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.84898
Title: It's All Connected: Topology-Aware Structural Graph Encoding Improves Performance on Polymer Prediction
Title（参考訳）: トポロジを意識した構造グラフエンコーディングによる高分子の予測性能の向上
Authors: H. Ibrahim Erdogan, Punith Raviswamy, Nikita Agrawal, Yannik Köster, Stefan Zechel, Ulrich S. Schubert, Ruben Mayer, Christopher Kuenneth,
Abstract要約: グラフニューラルネットワーク (GNN) は分子特性予測において強い結果を得たが、ポリマーは異なる課題を呈している。ラベル付きデータセットは、高価な実験を必要とするため、小さく(通常は数百のポリマーの順に)少ない。このギャップに対処する原理的なグラフ構成を提案する。
参考スコア（独自算出の注目度）: 2.1908196294783004
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph Neural Networks (GNNs) have achieved strong results in molecular property prediction, but polymers present distinct challenges: labeled datasets are scarce and small (typically in the order of hundreds of polymers) due to the need for expensive experimentation, and complex polymer chain distributions influence polymer properties. Established practice in polymer prediction represents polymers solely by graphs of their repeat units, discarding the chain-scale morphology that governs key properties such as the glass transition temperature ($T_g$). In this work, we propose a principled graph construction that addresses this gap. Given a polymer's molecular mass distribution (MMD), we sample representative chains from the Schulz-Zimm distribution and construct representative sets of large graphs encoding chain-scale topology directly, with atoms and bonds featurized using rich chemical descriptors. We further pretrain GNN encoders via masked graph modeling on 100,000 unlabeled PSMILES strings before fine-tuning on labeled data. On a dataset of 381 polymers (180 homopolymers and 201 copolymers), we show that graph construction and self-supervised pretraining are jointly necessary: without pretraining, the large graph method matches the repeat-unit baseline (28.40 K vs. 28.36 K RMSE); with pretraining, it achieves 24.76 K +/- 3.30 K, a 5.1% reduction in mean error over the pretrained repeat-unit baseline (26.08 K +/- 4.20 K, p < 0.001, 30 runs). An ablation removing chemical features degrades performance to 36.65 K, confirming both components are essential. Results are architecture-agnostic, holding for both GINE and GATv2 encoders.
Abstract（参考訳）: グラフニューラルネットワーク(GNN)は分子特性予測において強い成果を上げているが、高価な実験を必要とするためラベル付きデータセットは小さく(典型的には数百のポリマーの順序で)、複雑なポリマー鎖分布はポリマーの性質に影響を与える。高分子予測において確立された実践は、ガラス転移温度(T_g$)のような重要な性質を支配するチェーンスケールのモルフォロジーを捨て、リピート単位のグラフのみでポリマーを表現している。本研究では,このギャップに対処する原理的なグラフ構築を提案する。ポリマーの分子質量分布 (MMD) が与えられたとき、シュルツ-ジンム分布から代表鎖をサンプリングし、リッチな化学記述子を用いて原子と結合でチェーンスケールのトポロジーを直接コードする大きなグラフの代表集合を構成する。我々はさらに、ラベル付きデータを微調整する前に10万個の未ラベルPSMILES文字列のマスクグラフモデリングにより、GNNエンコーダを事前訓練する。 381ポリマー(180のホモポリマーと201の共重合体)のデータセットでは、グラフ構築と自己教師付き事前学習が共同で必要であることが示される: 事前訓練なしでは、大きなグラフ法は繰り返し単位ベースライン(28.40K vs. 28.36K RMSE)と一致し、事前訓練では24.76K +/-3.30K、事前訓練された繰り返し単位ベースライン(26.08K +/-4.20K, p < 0.001, 30ラン)の平均誤差が5.1%減少する。化学的特徴を除去するアブレーションは性能を36.65Kに低下させ、両方の部品が必須であることを確認した。 GINEとGATv2エンコーダの両方を保持できる。

論文の概要: It's All Connected: Topology-Aware Structural Graph Encoding Improves Performance on Polymer Prediction

関連論文リスト