Probabilistic Generative Transformer Language models for Generative
Design of Molecules
- URL: http://arxiv.org/abs/2209.09406v1
- Date: Tue, 20 Sep 2022 01:51:57 GMT
- Title: Probabilistic Generative Transformer Language models for Generative
Design of Molecules
- Authors: Lai Wei, Nihang Fu, Yuqi Song, Qian Wang, Jianjun Hu
- Abstract summary: Generative Molecular Transformer (GMTransformer) is a probabilistic neural network model for generative design of molecules.
Our model is built on the blank filling language model originally developed for text processing.
Our models achieve high novelty and Scaf compared to other baselines.
- Score: 10.412989388092084
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Self-supervised neural language models have recently found wide applications
in generative design of organic molecules and protein sequences as well as
representation learning for downstream structure classification and functional
prediction. However, most of the existing deep learning models for molecule
design usually require a big dataset and have a black-box architecture, which
makes it difficult to interpret their design logic. Here we propose Generative
Molecular Transformer (GMTransformer), a probabilistic neural network model for
generative design of molecules. Our model is built on the blank filling
language model originally developed for text processing, which has demonstrated
unique advantages in learning the "molecules grammars" with high-quality
generation, interpretability, and data efficiency. Benchmarked on the MOSES
datasets, our models achieve high novelty and Scaf compared to other baselines.
The probabilistic generation steps have the potential in tinkering molecule
design due to their capability of recommending how to modify existing molecules
with explanation, guided by the learned implicit molecule chemistry. The source
code and datasets can be accessed freely at
https://github.com/usccolumbia/GMTransformer
Related papers
- BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning [11.862370962277938]
We present a novel generative model, BindGPT, which uses a conceptually simple but powerful approach to create 3D molecules within the protein's binding site.
We show how such simple conceptual approach combined with pretraining and scaling can perform on par or better than the current best specialized diffusion models.
arXiv Detail & Related papers (2024-06-06T02:10:50Z) - Probabilistic Transformer: A Probabilistic Dependency Model for
Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective.
We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention.
Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z) - GIT-Mol: A Multi-modal Large Language Model for Molecular Science with
Graph, Image, and Text [25.979382232281786]
We introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information.
We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity.
arXiv Detail & Related papers (2023-08-14T03:12:29Z) - An Equivariant Generative Framework for Molecular Graph-Structure
Co-Design [54.92529253182004]
We present MolCode, a machine learning-based generative framework for underlineMolecular graph-structure underlineCo-design.
In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure.
Our investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design.
arXiv Detail & Related papers (2023-04-12T13:34:22Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Crystal Transformer: Self-learning neural language model for Generative
and Tinkering Design of Materials [4.813020904720316]
BLMM Crystal Transformer is a neural network based probabilistic generative model for generative and tinkering design of inorganic materials.
It can generate chemically valid materials compositions with as high as 89.7% charge neutrality and 84.8% balanced electronegativity.
A user-friendly web app has been developed for computational materials doping and can be accessed freely at urlwww.materialsatlas.org/blmtinker.
arXiv Detail & Related papers (2022-04-25T20:20:26Z) - Keeping it Simple: Language Models can learn Complex Molecular
Distributions [0.0]
We introduce several challenging generative modeling tasks by compiling especially complex distributions of molecules.
The results demonstrate that language models are powerful generative models, capable of adeptly learning complex molecular distributions.
arXiv Detail & Related papers (2021-12-06T13:40:58Z) - Deep Molecular Dreaming: Inverse machine learning for de-novo molecular
design and interpretability with surjective representations [1.433758865948252]
We propose PASITHEA, a gradient-based molecule optimization technique from computer vision.
It exploits the use of gradients by directly reversing the learning process of a neural network, which is trained to predict real-valued chemical properties.
Although our results are preliminary, we observe a shift in distribution of a chosen property during inverse-training, a clear indication of PASITHEA's viability.
arXiv Detail & Related papers (2020-12-17T16:34:59Z) - Learning Latent Space Energy-Based Prior Model for Molecule Generation [59.875533935578375]
We learn latent space energy-based prior model with SMILES representation for molecule modeling.
Our method is able to generate molecules with validity and uniqueness competitive with state-of-the-art models.
arXiv Detail & Related papers (2020-10-19T09:34:20Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.