Related papers: Probabilistic Generative Transformer Language models for Generative Design of Molecules

Probabilistic Generative Transformer Language models for Generative Design of Molecules

URL: http://arxiv.org/abs/2209.09406v1
Date: Tue, 20 Sep 2022 01:51:57 GMT
Title: Probabilistic Generative Transformer Language models for Generative Design of Molecules
Authors: Lai Wei, Nihang Fu, Yuqi Song, Qian Wang, Jianjun Hu
Abstract summary: Generative Molecular Transformer (GMTransformer) is a probabilistic neural network model for generative design of molecules. Our model is built on the blank filling language model originally developed for text processing. Our models achieve high novelty and Scaf compared to other baselines.
Score: 10.412989388092084
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Self-supervised neural language models have recently found wide applications in generative design of organic molecules and protein sequences as well as representation learning for downstream structure classification and functional prediction. However, most of the existing deep learning models for molecule design usually require a big dataset and have a black-box architecture, which makes it difficult to interpret their design logic. Here we propose Generative Molecular Transformer (GMTransformer), a probabilistic neural network model for generative design of molecules. Our model is built on the blank filling language model originally developed for text processing, which has demonstrated unique advantages in learning the "molecules grammars" with high-quality generation, interpretability, and data efficiency. Benchmarked on the MOSES datasets, our models achieve high novelty and Scaf compared to other baselines. The probabilistic generation steps have the potential in tinkering molecule design due to their capability of recommending how to modify existing molecules with explanation, guided by the learned implicit molecule chemistry. The source code and datasets can be accessed freely at https://github.com/usccolumbia/GMTransformer

Related papers

Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph Languages [20.579076532082684]
We propose Foundation Molecular Grammar (), which leverages multi-modal foundation models (MMFMs) to induce an interpretable molecular language.<n>By exploiting the chemical knowledge of an MMFM, renders molecules as images, describes them as text, and aligns information across modalities using prompt learning.
arXiv Detail & Related papers (2025-05-29T00:03:09Z)
Conformation Generation using Transformer Flows [55.2480439325792]
We present ConfFlow, a flow-based model for conformation generation based on transformer networks. ConfFlow directly samples in the coordinate space without enforcing any explicit physical constraints. ConfFlow improve accuracy by up to $40%$ relative to state-of-the-art learning-based methods.
arXiv Detail & Related papers (2024-11-16T14:42:05Z)
GraphXForm: Graph transformer for computer-aided molecular design with application to extraction [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds and then fine-tuned. We evaluate it on two solvent design tasks for liquid-liquid extraction, showing that it outperforms four state-of-the-art molecular design techniques.
arXiv Detail & Related papers (2024-11-03T19:45:15Z)
Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms. This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z)
Chemical Language Model Linker: blending text and molecules with modular adapters [2.2667044928324747]
We propose a lightweight adapter-based strategy named Chemical Language Model Linker (ChemLML) ChemLML blends the two single domain models and obtains conditional molecular generation from text descriptions. We find that the choice of molecular representation used within ChemLML, SMILES versus SELFIES, has a strong influence on conditional molecular generation performance.
arXiv Detail & Related papers (2024-10-26T13:40:13Z)
MING: A Functional Approach to Learning Molecular Generative Models [46.189683355768736]
This paper introduces a novel paradigm for learning molecule generative models based on functional representations. We propose Molecular Implicit Neural Generation (MING), a diffusion-based model that learns molecular distributions in the function space.
arXiv Detail & Related papers (2024-10-16T13:02:02Z)
BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning [11.862370962277938]
We present a novel generative model, BindGPT, which uses a conceptually simple but powerful approach to create 3D molecules within the protein's binding site. We show how such simple conceptual approach combined with pretraining and scaling can perform on par or better than the current best specialized diffusion models.
arXiv Detail & Related papers (2024-06-06T02:10:50Z)
Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective. We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention. Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z)
GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text [25.979382232281786]
We introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information. We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity.
arXiv Detail & Related papers (2023-08-14T03:12:29Z)
Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation. We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria. Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z)
Crystal Transformer: Self-learning neural language model for Generative and Tinkering Design of Materials [4.813020904720316]
BLMM Crystal Transformer is a neural network based probabilistic generative model for generative and tinkering design of inorganic materials. It can generate chemically valid materials compositions with as high as 89.7% charge neutrality and 84.8% balanced electronegativity. A user-friendly web app has been developed for computational materials doping and can be accessed freely at urlwww.materialsatlas.org/blmtinker.
arXiv Detail & Related papers (2022-04-25T20:20:26Z)
Keeping it Simple: Language Models can learn Complex Molecular Distributions [0.0]
We introduce several challenging generative modeling tasks by compiling especially complex distributions of molecules. The results demonstrate that language models are powerful generative models, capable of adeptly learning complex molecular distributions.
arXiv Detail & Related papers (2021-12-06T13:40:58Z)
Learning Latent Space Energy-Based Prior Model for Molecule Generation [59.875533935578375]
We learn latent space energy-based prior model with SMILES representation for molecule modeling. Our method is able to generate molecules with validity and uniqueness competitive with state-of-the-art models.
arXiv Detail & Related papers (2020-10-19T09:34:20Z)
Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning. GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.