Fugu-MT 論文翻訳(概要): Attribute Graphs Underlying Molecular Generative Models: Path to Learning with Limited Data

論文の概要: Attribute Graphs Underlying Molecular Generative Models: Path to Learning with Limited Data

arxiv url: http://arxiv.org/abs/2207.07174v2
Date: Thu, 29 Aug 2024 19:27:49 GMT
ステータス: 翻訳完了
システム内更新日: 2024-09-02 20:50:35.771043
Title: Attribute Graphs Underlying Molecular Generative Models: Path to Learning with Limited Data
Title（参考訳）: 分子生成モデルに基づく属性グラフ:限られたデータによる学習への道
Authors: Samuel C. Hoffman, Payel Das, Karthikeyan Shanmugam, Kahini Wadhawan, Prasanna Sattigeri,
Abstract要約: 本研究では,事前学習された生成オートエンコーダの潜伏符号の摂動実験を頼りに属性グラフを探索するアルゴリズムを提案する。潜在符号間の構造方程式モデルをモデル化する有効なグラフィカルモデルに適合することを示す。小分子の大きなデータセットで訓練された事前学習された生成オートエンコーダを用いて、グラフィカルモデルを用いて特定の特性を予測できることを実証する。
参考スコア（独自算出の注目度）: 42.517927809224275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training generative models that capture rich semantics of the data and interpreting the latent representations encoded by such models are very important problems in un-/self-supervised learning. In this work, we provide a simple algorithm that relies on perturbation experiments on latent codes of a pre-trained generative autoencoder to uncover an attribute graph that is implied by the generative model. We perform perturbation experiments to check for influence of a given latent variable on a subset of attributes. Given this, we show that one can fit an effective graphical model that models a structural equation model between latent codes taken as exogenous variables and attributes taken as observed variables. One interesting aspect is that a single latent variable controls multiple overlapping subsets of attributes unlike conventional approaches that try to impose full independence. Using a pre-trained generative autoencoder trained on a large dataset of small molecules, we demonstrate that the graphical model between various molecular attributes and latent codes learned by our algorithm can be used to predict a specific property for molecules which are drawn from a different distribution. We compare prediction models trained on various feature subsets chosen by simple baselines, as well as existing causal discovery and sparse learning/feature selection methods, with the ones in the derived Markov blanket from our method. Results show empirically that the predictor that relies on our Markov blanket attributes is robust to distribution shifts when transferred or fine-tuned with a few samples from the new distribution, especially when training data is limited.
Abstract（参考訳）: データのリッチなセマンティクスをキャプチャし、そのようなモデルによって符号化された潜在表現を解釈する生成モデルを訓練することは、教師なし学習において非常に重要な問題である。本研究では、事前学習された生成オートエンコーダの潜伏符号の摂動実験を頼りに、生成モデルによって示唆される属性グラフを探索する簡単なアルゴリズムを提案する。我々は摂動実験を行い、与えられた潜在変数が属性のサブセットに与える影響をチェックする。この結果から,外因性変数として取られた潜在符号と,観測された変数として取られた属性との間の構造方程式モデルをモデル化する有効なグラフィカルモデルに適合することを示す。興味深い側面の1つは、単一の潜伏変数が、完全な独立を強制しようとする従来のアプローチとは異なり、属性の複数の重複部分集合を制御することである。小分子の大規模なデータセットに基づいて学習した事前学習された生成自己エンコーダを用いて,本アルゴリズムで学習した様々な分子特性と潜時符号の間のグラフィカルモデルを用いて,異なる分布から引き出された分子の特定の特性を予測することができることを示す。従来の因果探索法やスパース学習/特徴選択法と同様に,単純なベースラインで選択された様々な特徴サブセットに基づいて訓練された予測モデルと,本手法から抽出したマルコフブランケットの予測モデルを比較した。その結果、マルコフの毛布属性に依存する予測器は、新しい分布からいくつかのサンプルを転送または微調整した場合、特にトレーニングデータに制限がある場合、分布シフトに対して頑健であることが実証された。

論文の概要: Attribute Graphs Underlying Molecular Generative Models: Path to Learning with Limited Data

関連論文リスト