A Group Symmetric Stochastic Differential Equation Model for Molecule
Multi-modal Pretraining
- URL: http://arxiv.org/abs/2305.18407v1
- Date: Sun, 28 May 2023 15:56:02 GMT
- Title: A Group Symmetric Stochastic Differential Equation Model for Molecule
Multi-modal Pretraining
- Authors: Shengchao Liu, Weitao Du, Zhiming Ma, Hongyu Guo, Jian Tang
- Abstract summary: molecule pretraining has quickly become the go-to schema to boost the performance of AI-based drug discovery.
Here, we propose MoleculeSDE to generate the 3D reflection from 2D topologies, and vice versa, directly in the input space.
By comparing with 17 pretraining baselines, we empirically verify that MoleculeSDE can learn an expressive representation with state-of-the-art performance on 26 out of 32 downstream tasks.
- Score: 36.48602272037559
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Molecule pretraining has quickly become the go-to schema to boost the
performance of AI-based drug discovery. Naturally, molecules can be represented
as 2D topological graphs or 3D geometric point clouds. Although most existing
pertaining methods focus on merely the single modality, recent research has
shown that maximizing the mutual information (MI) between such two modalities
enhances the molecule representation ability. Meanwhile, existing molecule
multi-modal pretraining approaches approximate MI based on the representation
space encoded from the topology and geometry, thus resulting in the loss of
critical structural information of molecules. To address this issue, we propose
MoleculeSDE. MoleculeSDE leverages group symmetric (e.g., SE(3)-equivariant and
reflection-antisymmetric) stochastic differential equation models to generate
the 3D geometries from 2D topologies, and vice versa, directly in the input
space. It not only obtains tighter MI bound but also enables prosperous
downstream tasks than the previous work. By comparing with 17 pretraining
baselines, we empirically verify that MoleculeSDE can learn an expressive
representation with state-of-the-art performance on 26 out of 32 downstream
tasks.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Pre-training of Molecular GNNs via Conditional Boltzmann Generator [0.0]
We propose a pre-training method for molecular GNNs using an existing dataset of molecular conformations.
We show that our model has a better prediction performance for molecular properties than existing pre-training methods.
arXiv Detail & Related papers (2023-12-20T15:30:15Z) - Molecule Joint Auto-Encoding: Trajectory Pretraining with 2D and 3D
Diffusion [19.151643496588022]
We propose a pretraining method for molecule joint auto-encoding (MoleculeJAE)
MoleculeJAE can learn both the 2D bond (topology) and 3D conformation (geometry) information.
Empirically, MoleculeJAE proves its effectiveness by reaching state-of-the-art performance on 15 out of 20 tasks.
arXiv Detail & Related papers (2023-12-06T12:58:37Z) - Geometric Latent Diffusion Models for 3D Molecule Generation [172.15028281732737]
Generative models, especially diffusion models (DMs), have achieved promising results for generating feature-rich geometries.
We propose a novel and principled method for 3D molecule generation named Geometric Latent Diffusion Models (GeoLDM)
arXiv Detail & Related papers (2023-05-02T01:07:22Z) - MUDiff: Unified Diffusion for Complete Molecule Generation [104.7021929437504]
We present a new model for generating a comprehensive representation of molecules, including atom features, 2D discrete molecule structures, and 3D continuous molecule coordinates.
We propose a novel graph transformer architecture to denoise the diffusion process.
Our model is a promising approach for designing stable and diverse molecules and can be applied to a wide range of tasks in molecular modeling.
arXiv Detail & Related papers (2023-04-28T04:25:57Z) - Learning Harmonic Molecular Representations on Riemannian Manifold [18.49126496517951]
Molecular representation learning plays a crucial role in AI-assisted drug discovery research.
We propose a Harmonic Molecular Representation learning framework, which represents a molecule using the Laplace-Beltrami eigenfunctions of its molecular surface.
arXiv Detail & Related papers (2023-03-27T18:02:47Z) - Geometry-Complete Diffusion for 3D Molecule Generation and Optimization [3.8366697175402225]
We introduce the Geometry-Complete Diffusion Model (GCDM) for 3D molecule generation.
GCDM outperforms existing 3D molecular diffusion models by significant margins across conditional and unconditional settings.
We also show that GCDM's geometric features can be repurposed to consistently optimize the geometry and chemical composition of existing 3D molecules.
arXiv Detail & Related papers (2023-02-08T20:01:51Z) - GeoMol: Torsional Geometric Generation of Molecular 3D Conformer
Ensembles [60.12186997181117]
Prediction of a molecule's 3D conformer ensemble from the molecular graph holds a key role in areas of cheminformatics and drug discovery.
Existing generative models have several drawbacks including lack of modeling important molecular geometry elements.
We propose GeoMol, an end-to-end, non-autoregressive and SE(3)-invariant machine learning approach to generate 3D conformers.
arXiv Detail & Related papers (2021-06-08T14:17:59Z) - An End-to-End Framework for Molecular Conformation Generation via
Bilevel Programming [71.82571553927619]
We propose an end-to-end solution for molecular conformation prediction called ConfVAE.
Specifically, the molecular graph is first encoded in a latent space, and then the 3D structures are generated by solving a principled bilevel optimization program.
arXiv Detail & Related papers (2021-05-15T15:22:29Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.