Fugu-MT 論文翻訳(概要): Self-Conditioned Denoising for Atomistic Representation Learning

論文の概要: Self-Conditioned Denoising for Atomistic Representation Learning

arxiv url: http://arxiv.org/abs/2603.17196v1
Date: Tue, 17 Mar 2026 22:52:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.432127
Title: Self-Conditioned Denoising for Atomistic Representation Learning
Title（参考訳）: Atomistic Representation Learningのための自己定義型Denoising
Authors: Tynan Perez, Rafael Gomez-Bombarelli,
Abstract要約: 自己完結デノナイジング(Self-Conditioned Denoising)は、原子性データのあらゆる領域にわたる条件付きデノナイジングに自己埋め込みを利用する再構成目的である。 SCDによって事前訓練された小さな高速GNNは、ラベル付きまたはラベルなしのデータセットで事前訓練されたより大きなモデルに対して、競争力や優れた性能が得られることを示す。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The success of large-scale pretraining in NLP and computer vision has catalyzed growing efforts to develop analogous foundation models for the physical sciences. However, pretraining strategies using atomistic data remain underexplored. To date, large-scale supervised pretraining on DFT force-energy labels has provided the strongest performance gains to downstream property prediction, out-performing existing methods of self-supervised learning (SSL) which remain limited to ground-state geometries, and/or single domains of atomistic data. We address these shortcomings with Self-Conditioned Denoising (SCD), a backbone-agnostic reconstruction objective that utilizes self-embeddings for conditional denoising across any domain of atomistic data, including small molecules, proteins, periodic materials, and 'non-equilibrium' geometries. When controlled for backbone architecture and pretraining dataset, SCD significantly outperforms previous SSL methods on downstream benchmarks and matches or exceeds the performance of supervised force-energy pretraining. We show that a small, fast GNN pretrained by SCD can achieve competitive or superior performance to larger models pretrained on significantly larger labeled or unlabeled datasets, across tasks in multiple domains. Our code is available at: https://github.com/TyJPerez/SelfConditionedDenoisingAtoms
Abstract（参考訳）: NLPとコンピュータビジョンにおける大規模な事前トレーニングの成功は、物理科学の類似基盤モデルを開発するための努力を円滑に進めている。しかし、原子構造データを用いた事前学習戦略はまだ未定である。現在までに、DFTの力エネルギーラベルによる大規模教師付き事前訓練は、地上のジオメトリや原子データの単一領域に限られる既存の自己教師付き学習法(SSL)よりも優れた性能向上をもたらしている。筆者らは, 小分子, タンパク質, 周期材料, および非平衡なジオメトリーを含む, あらゆる原子データ領域にわたる条件付きデノナイジングに自己埋め込みを利用する, 背骨非依存的再建目標であるセルフコンディションド・デノナイジング(SCD)の欠点に対処する。バックボーンアーキテクチャとプレトレーニングデータセットを制御した場合、SCDはダウンストリームベンチマークで以前のSSLメソッドよりも大幅に優れており、教師付き力エネルギー事前トレーニングのパフォーマンスと一致または上回っている。我々は、SCDによって事前訓練された小さな高速GNNが、複数のドメインのタスク間で、ラベル付きまたはラベルなしのデータセットで事前訓練されたより大きなモデルに対して、競争力または優れたパフォーマンスを達成することを示す。私たちのコードは、https://github.com/TyJPerez/SelfConditionedDenoisingAtomsで利用可能です。

論文の概要: Self-Conditioned Denoising for Atomistic Representation Learning

関連論文リスト