Probabilistic Generative Transformer Language models for Generative
Design of Molecules
- URL: http://arxiv.org/abs/2209.09406v1
- Date: Tue, 20 Sep 2022 01:51:57 GMT
- Title: Probabilistic Generative Transformer Language models for Generative
Design of Molecules
- Authors: Lai Wei, Nihang Fu, Yuqi Song, Qian Wang, Jianjun Hu
- Abstract summary: Generative Molecular Transformer (GMTransformer) is a probabilistic neural network model for generative design of molecules.
Our model is built on the blank filling language model originally developed for text processing.
Our models achieve high novelty and Scaf compared to other baselines.
- Score: 10.412989388092084
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Self-supervised neural language models have recently found wide applications
in generative design of organic molecules and protein sequences as well as
representation learning for downstream structure classification and functional
prediction. However, most of the existing deep learning models for molecule
design usually require a big dataset and have a black-box architecture, which
makes it difficult to interpret their design logic. Here we propose Generative
Molecular Transformer (GMTransformer), a probabilistic neural network model for
generative design of molecules. Our model is built on the blank filling
language model originally developed for text processing, which has demonstrated
unique advantages in learning the "molecules grammars" with high-quality
generation, interpretability, and data efficiency. Benchmarked on the MOSES
datasets, our models achieve high novelty and Scaf compared to other baselines.
The probabilistic generation steps have the potential in tinkering molecule
design due to their capability of recommending how to modify existing molecules
with explanation, guided by the learned implicit molecule chemistry. The source
code and datasets can be accessed freely at
https://github.com/usccolumbia/GMTransformer
Related papers
- GraphXForm: Graph transformer for computer-aided molecular design with application to extraction [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds and then fine-tuned.
We evaluate it on two solvent design tasks for liquid-liquid extraction, showing that it outperforms four state-of-the-art molecular design techniques.
arXiv Detail & Related papers (2024-11-03T19:45:15Z) - Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Chemical Language Model Linker: blending text and molecules with modular adapters [2.2667044928324747]
We propose a lightweight adapter-based strategy named Chemical Language Model Linker (ChemLML)
ChemLML blends the two single domain models and obtains conditional molecular generation from text descriptions.
We find that the choice of molecular representation used within ChemLML, SMILES versus SELFIES, has a strong influence on conditional molecular generation performance.
arXiv Detail & Related papers (2024-10-26T13:40:13Z) - BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning [11.862370962277938]
We present a novel generative model, BindGPT, which uses a conceptually simple but powerful approach to create 3D molecules within the protein's binding site.
We show how such simple conceptual approach combined with pretraining and scaling can perform on par or better than the current best specialized diffusion models.
arXiv Detail & Related papers (2024-06-06T02:10:50Z) - Probabilistic Transformer: A Probabilistic Dependency Model for
Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective.
We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention.
Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z) - GIT-Mol: A Multi-modal Large Language Model for Molecular Science with
Graph, Image, and Text [25.979382232281786]
We introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information.
We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity.
arXiv Detail & Related papers (2023-08-14T03:12:29Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Crystal Transformer: Self-learning neural language model for Generative
and Tinkering Design of Materials [4.813020904720316]
BLMM Crystal Transformer is a neural network based probabilistic generative model for generative and tinkering design of inorganic materials.
It can generate chemically valid materials compositions with as high as 89.7% charge neutrality and 84.8% balanced electronegativity.
A user-friendly web app has been developed for computational materials doping and can be accessed freely at urlwww.materialsatlas.org/blmtinker.
arXiv Detail & Related papers (2022-04-25T20:20:26Z) - Keeping it Simple: Language Models can learn Complex Molecular
Distributions [0.0]
We introduce several challenging generative modeling tasks by compiling especially complex distributions of molecules.
The results demonstrate that language models are powerful generative models, capable of adeptly learning complex molecular distributions.
arXiv Detail & Related papers (2021-12-06T13:40:58Z) - Learning Latent Space Energy-Based Prior Model for Molecule Generation [59.875533935578375]
We learn latent space energy-based prior model with SMILES representation for molecule modeling.
Our method is able to generate molecules with validity and uniqueness competitive with state-of-the-art models.
arXiv Detail & Related papers (2020-10-19T09:34:20Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.