BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning
- URL: http://arxiv.org/abs/2406.03686v1
- Date: Thu, 6 Jun 2024 02:10:50 GMT
- Title: BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning
- Authors: Artem Zholus, Maksim Kuznetsov, Roman Schutski, Rim Shayakhmetov, Daniil Polykovskiy, Sarath Chandar, Alex Zhavoronkov,
- Abstract summary: We present a novel generative model, BindGPT, which uses a conceptually simple but powerful approach to create 3D molecules within the protein's binding site.
We show how such simple conceptual approach combined with pretraining and scaling can perform on par or better than the current best specialized diffusion models.
- Score: 11.862370962277938
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Generating novel active molecules for a given protein is an extremely challenging task for generative models that requires an understanding of the complex physical interactions between the molecule and its environment. In this paper, we present a novel generative model, BindGPT which uses a conceptually simple but powerful approach to create 3D molecules within the protein's binding site. Our model produces molecular graphs and conformations jointly, eliminating the need for an extra graph reconstruction step. We pretrain BindGPT on a large-scale dataset and fine-tune it with reinforcement learning using scores from external simulation software. We demonstrate how a single pretrained language model can serve at the same time as a 3D molecular generative model, conformer generator conditioned on the molecular graph, and a pocket-conditioned 3D molecule generator. Notably, the model does not make any representational equivariance assumptions about the domain of generation. We show how such simple conceptual approach combined with pretraining and scaling can perform on par or better than the current best specialized diffusion models, language models, and graph neural networks while being two orders of magnitude cheaper to sample.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Zero Shot Molecular Generation via Similarity Kernels [0.6597195879147557]
We present Similarity-based Molecular Generation (SiMGen), a new method for zero shot molecular generation.
SiMGen combines a time-dependent similarity kernel with descriptors from a pretrained machine learning force field to generate molecules.
We also release an interactive web tool that allows users to generate structures with SiMGen online.
arXiv Detail & Related papers (2024-02-13T17:53:44Z) - Pre-training of Molecular GNNs via Conditional Boltzmann Generator [0.0]
We propose a pre-training method for molecular GNNs using an existing dataset of molecular conformations.
We show that our model has a better prediction performance for molecular properties than existing pre-training methods.
arXiv Detail & Related papers (2023-12-20T15:30:15Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - Learning Joint 2D & 3D Diffusion Models for Complete Molecule Generation [32.66694406638287]
We propose a new joint 2D and 3D diffusion model (JODO) that generates molecules with atom types, formal charges, bond information, and 3D coordinates.
Our model can also be extended for inverse molecular design targeting single or multiple quantum properties.
arXiv Detail & Related papers (2023-05-21T04:49:53Z) - MUDiff: Unified Diffusion for Complete Molecule Generation [104.7021929437504]
We present a new model for generating a comprehensive representation of molecules, including atom features, 2D discrete molecule structures, and 3D continuous molecule coordinates.
We propose a novel graph transformer architecture to denoise the diffusion process.
Our model is a promising approach for designing stable and diverse molecules and can be applied to a wide range of tasks in molecular modeling.
arXiv Detail & Related papers (2023-04-28T04:25:57Z) - Probabilistic Generative Transformer Language models for Generative
Design of Molecules [10.412989388092084]
Generative Molecular Transformer (GMTransformer) is a probabilistic neural network model for generative design of molecules.
Our model is built on the blank filling language model originally developed for text processing.
Our models achieve high novelty and Scaf compared to other baselines.
arXiv Detail & Related papers (2022-09-20T01:51:57Z) - Keeping it Simple: Language Models can learn Complex Molecular
Distributions [0.0]
We introduce several challenging generative modeling tasks by compiling especially complex distributions of molecules.
The results demonstrate that language models are powerful generative models, capable of adeptly learning complex molecular distributions.
arXiv Detail & Related papers (2021-12-06T13:40:58Z) - GeoMol: Torsional Geometric Generation of Molecular 3D Conformer
Ensembles [60.12186997181117]
Prediction of a molecule's 3D conformer ensemble from the molecular graph holds a key role in areas of cheminformatics and drug discovery.
Existing generative models have several drawbacks including lack of modeling important molecular geometry elements.
We propose GeoMol, an end-to-end, non-autoregressive and SE(3)-invariant machine learning approach to generate 3D conformers.
arXiv Detail & Related papers (2021-06-08T14:17:59Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z) - Learning Latent Space Energy-Based Prior Model for Molecule Generation [59.875533935578375]
We learn latent space energy-based prior model with SMILES representation for molecule modeling.
Our method is able to generate molecules with validity and uniqueness competitive with state-of-the-art models.
arXiv Detail & Related papers (2020-10-19T09:34:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.