LLamol: A Dynamic Multi-Conditional Generative Transformer for De Novo
Molecular Design
- URL: http://arxiv.org/abs/2311.14407v1
- Date: Fri, 24 Nov 2023 10:59:12 GMT
- Title: LLamol: A Dynamic Multi-Conditional Generative Transformer for De Novo
Molecular Design
- Authors: Niklas Dobberstein, Astrid Maass, Jan Hamaekers
- Abstract summary: "LLamol" is a single novel generative transformer model based on the LLama 2 architecture.
We demonstrate that the model adeptly handles single- and multi-conditional organic molecule generation with up to four conditions.
In detail, we showcase the model's capability to utilize token sequences for conditioning, either individually or in combination with numerical properties.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Generative models have demonstrated substantial promise in Natural Language
Processing (NLP) and have found application in designing molecules, as seen in
General Pretrained Transformer (GPT) models. In our efforts to develop such a
tool for exploring the organic chemical space in search of potentially
electro-active compounds, we present "LLamol", a single novel generative
transformer model based on the LLama 2 architecture, which was trained on a 13M
superset of organic compounds drawn from diverse public sources. To allow for a
maximum flexibility in usage and robustness in view of potentially incomplete
data, we introduce "Stochastic Context Learning" as a new training procedure.
We demonstrate that the resulting model adeptly handles single- and
multi-conditional organic molecule generation with up to four conditions, yet
more are possible. The model generates valid molecular structures in SMILES
notation while flexibly incorporating three numerical and/or one token sequence
into the generative process, just as requested. The generated compounds are
very satisfactory in all scenarios tested. In detail, we showcase the model's
capability to utilize token sequences for conditioning, either individually or
in combination with numerical properties, making LLamol a potent tool for de
novo molecule design, easily expandable with new properties.
Related papers
- LDMol: Text-Conditioned Molecule Diffusion Model Leveraging Chemically Informative Latent Space [55.5427001668863]
We present a novel latent diffusion model dubbed LDMol, which enables a natural text-conditioned molecule generation.
Specifically, LDMol is composed of three building blocks: a molecule encoder that produces a chemically informative feature space, a natural language-conditioned latent diffusion model using a Diffusion Transformer (DiT), and an autoregressive decoder for molecule regressive.
arXiv Detail & Related papers (2024-05-28T04:59:13Z) - Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model [50.756644656847165]
We introduce a multi-constraint molecular generation large language model, TSMMG, akin to a student.
To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers'
We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements.
arXiv Detail & Related papers (2024-03-20T02:15:55Z) - A novel molecule generative model of VAE combined with Transformer for unseen structure generation [0.0]
Transformer and VAE are widely used as powerful models, but they are rarely used in combination due to structural and performance mismatch.
This study proposes a model that combines these two models through structural and parameter optimization in handling diverse molecules.
The proposed model shows comparable performance to existing models in generating molecules, and showed by far superior performance in generating molecules with unseen structures.
arXiv Detail & Related papers (2024-02-19T08:46:04Z) - Learning Over Molecular Conformer Ensembles: Datasets and Benchmarks [45.9401235464876]
We introduce the first MoleculAR Conformer Ensemble Learning benchmark to thoroughly evaluate the potential of learning on conformer ensembles.
Our findings reveal that direct learning from an conformer space can improve performance on a variety of tasks and models.
arXiv Detail & Related papers (2023-09-29T20:06:46Z) - An Equivariant Generative Framework for Molecular Graph-Structure
Co-Design [54.92529253182004]
We present MolCode, a machine learning-based generative framework for underlineMolecular graph-structure underlineCo-design.
In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure.
Our investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design.
arXiv Detail & Related papers (2023-04-12T13:34:22Z) - PrefixMol: Target- and Chemistry-aware Molecule Design via Prefix
Embedding [34.27649279751879]
We develop a novel generative model that considers both the targeted pocket's circumstances and a variety of chemical properties.
Experiments show that our model exhibits good controllability in both single and multi-conditional molecular generation.
arXiv Detail & Related papers (2023-02-14T15:27:47Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Augmenting Molecular Deep Generative Models with Topological Data
Analysis Representations [21.237758981760784]
We present a SMILES Variational Auto-Encoder (VAE) augmented with topological data analysis (TDA) representations of molecules.
Our experiments show that this TDA augmentation enables a SMILES VAE to capture the complex relation between 3D geometry and electronic properties.
arXiv Detail & Related papers (2021-06-08T15:49:21Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z) - Learning Latent Space Energy-Based Prior Model for Molecule Generation [59.875533935578375]
We learn latent space energy-based prior model with SMILES representation for molecule modeling.
Our method is able to generate molecules with validity and uniqueness competitive with state-of-the-art models.
arXiv Detail & Related papers (2020-10-19T09:34:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.