Fugu-MT 論文翻訳(概要): STAR-VAE: Latent Variable Transformers for Scalable and Controllable Molecular Generation

論文の概要: STAR-VAE: Latent Variable Transformers for Scalable and Controllable Molecular Generation

arxiv url: http://arxiv.org/abs/2511.02769v1
Date: Tue, 04 Nov 2025 17:56:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 18:47:06.129877
Title: STAR-VAE: Latent Variable Transformers for Scalable and Controllable Molecular Generation
Title（参考訳）: STAR-VAE:スケーラブルかつ制御可能な分子生成用潜時可変変圧器
Authors: Bum Chul Kwon, Ben Shapira, Moshiko Raboh, Shreyans Sethi, Shruti Murarka, Joseph A Morrone, Jianying Hu, Parthasarathy Suryanarayanan,
Abstract要約: 本稿では,STAR-VAE(Selfies-encoded, Transformer-based, AutoRegressive Variational Autotuning)を提案する。 PubChemの9900万の薬物様分子で、SELFIESを使用して、シナティクスの妥当性を保証する。コントリビューションは, (i) SELFIES表現に基づいて訓練されたトランスフォーマーベースの潜在変数エンコーダ-デコーダモデル, (ii) 特性誘導生成のための条件付き潜在変数定式化, (iii) 両エンコーダにおけるローランクアダプタ(LoRA)による効率的な微調整である。
参考スコア（独自算出の注目度）: 3.585036812627313
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The chemical space of drug-like molecules is vast, motivating the development of generative models that must learn broad chemical distributions, enable conditional generation by capturing structure-property representations, and provide fast molecular generation. Meeting the objectives depends on modeling choices, including the probabilistic modeling approach, the conditional generative formulation, the architecture, and the molecular input representation. To address the challenges, we present STAR-VAE (Selfies-encoded, Transformer-based, AutoRegressive Variational Auto Encoder), a scalable latent-variable framework with a Transformer encoder and an autoregressive Transformer decoder. It is trained on 79 million drug-like molecules from PubChem, using SELFIES to guarantee syntactic validity. The latent-variable formulation enables conditional generation: a property predictor supplies a conditioning signal that is applied consistently to the latent prior, the inference network, and the decoder. Our contributions are: (i) a Transformer-based latent-variable encoder-decoder model trained on SELFIES representations; (ii) a principled conditional latent-variable formulation for property-guided generation; and (iii) efficient finetuning with low-rank adapters (LoRA) in both encoder and decoder, enabling fast adaptation with limited property and activity data. On the GuacaMol and MOSES benchmarks, our approach matches or exceeds baselines, and latent-space analyses reveal smooth, semantically structured representations that support both unconditional exploration and property-aware generation. On the Tartarus benchmarks, the conditional model shifts docking-score distributions toward stronger predicted binding. These results suggest that a modernized, scale-appropriate VAE remains competitive for molecular generation when paired with principled conditioning and parameter-efficient finetuning.
Abstract（参考訳）: 薬物のような分子の化学空間は広大なものであり、幅広い化学分布を学習し、構造的優位性の表現を捉えて条件付き生成を可能にし、高速な分子生成をもたらす生成モデルの開発を動機付けている。目的を満たすためには、確率的モデリングアプローチ、条件生成の定式化、アーキテクチャ、分子入力表現など、モデリングの選択に依存する。この課題に対処するため,STAR-VAE(Selfies-encoded, Transformer-based, AutoRegressive Variational Auto Encoder)を提案する。 PubChemの9900万の薬物様分子で、SELFIESを使用して、シナティクスの妥当性を保証する。特性予測器は、潜伏前と推論ネットワークとデコーダとに一貫した条件付け信号を供給する。私たちの貢献は次のとおりです。 i) SELFIES表現に基づいて訓練されたトランスフォーマーベースの潜在変数エンコーダ-デコーダモデル二資産誘導発生のための原則付き条件付き潜伏変数の定式化三エンコーダとデコーダの両方においてローランクアダプタ(LoRA)を用いた効率的な微調整を行うことにより、限られた特性と活動データによる高速な適応を可能にする。 GuacaMol と MOSES のベンチマークでは、我々のアプローチはベースラインと一致しているか超え、潜在空間解析により、非条件探索とプロパティ認識の生成の両方をサポートする滑らかで意味的に構造化された表現が示される。タルタルスのベンチマークでは、条件付きモデルはドッキングスコア分布をより強い予測結合にシフトさせる。これらの結果から, 原理的条件付けとパラメータ効率の微調整を組み合わせれば, 改良されたスケール適合型VAEは分子生成の競争力を維持することが示唆された。

論文の概要: STAR-VAE: Latent Variable Transformers for Scalable and Controllable Molecular Generation

関連論文リスト