Zero Shot Molecular Generation via Similarity Kernels
- URL: http://arxiv.org/abs/2402.08708v1
- Date: Tue, 13 Feb 2024 17:53:44 GMT
- Title: Zero Shot Molecular Generation via Similarity Kernels
- Authors: Rokas Elijo\v{s}ius, Fabian Zills, Ilyes Batatia, Sam Walton Norwood,
D\'avid P\'eter Kov\'acs, Christian Holm and G\'abor Cs\'anyi
- Abstract summary: We present Similarity-based Molecular Generation (SiMGen), a new method for zero shot molecular generation.
SiMGen combines a time-dependent similarity kernel with descriptors from a pretrained machine learning force field to generate molecules.
We also release an interactive web tool that allows users to generate structures with SiMGen online.
- Score: 0.6597195879147557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative modelling aims to accelerate the discovery of novel chemicals by
directly proposing structures with desirable properties. Recently, score-based,
or diffusion, generative models have significantly outperformed previous
approaches. Key to their success is the close relationship between the score
and physical force, allowing the use of powerful equivariant neural networks.
However, the behaviour of the learnt score is not yet well understood. Here, we
analyse the score by training an energy-based diffusion model for molecular
generation. We find that during the generation the score resembles a
restorative potential initially and a quantum-mechanical force at the end. In
between the two endpoints, it exhibits special properties that enable the
building of large molecules. Using insights from the trained model, we present
Similarity-based Molecular Generation (SiMGen), a new method for zero shot
molecular generation. SiMGen combines a time-dependent similarity kernel with
descriptors from a pretrained machine learning force field to generate
molecules without any further training. Our approach allows full control over
the molecular shape through point cloud priors and supports conditional
generation. We also release an interactive web tool that allows users to
generate structures with SiMGen online (https://zndraw.icp.uni-stuttgart.de).
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Nutmeg and SPICE: Models and Data for Biomolecular Machine Learning [1.747623282473278]
SPICE dataset is a collection of quantum chemistry calculations for training machine learning potentials.
We train a set of potential energy functions called Nutmeg on it.
arXiv Detail & Related papers (2024-06-18T23:54:21Z) - BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning [11.862370962277938]
We present a novel generative model, BindGPT, which uses a conceptually simple but powerful approach to create 3D molecules within the protein's binding site.
We show how such simple conceptual approach combined with pretraining and scaling can perform on par or better than the current best specialized diffusion models.
arXiv Detail & Related papers (2024-06-06T02:10:50Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular
Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT)
MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt.
Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Torsional Diffusion for Molecular Conformer Generation [28.225704750892795]
torsional diffusion is a novel diffusion framework that operates on the space of torsion angles.
On a standard benchmark of drug-like molecules, torsional diffusion generates superior conformer ensembles.
Our model provides exact likelihoods, which we employ to build the first generalizable Boltzmann generator.
arXiv Detail & Related papers (2022-06-01T04:30:41Z) - Fragment-based molecular generative model with high generalization
ability and synthetic accessibility [0.0]
We propose a fragment-based molecular generative model which designs new molecules with target properties.
A key feature of our model is a high generalization ability in terms of property control and fragment types.
We show that the model can generate molecules with the simultaneous control of multiple target properties at a high success rate.
arXiv Detail & Related papers (2021-11-25T04:44:37Z) - Relative Molecule Self-Attention Transformer [4.020171169198032]
Relative Molecule Attention Transformer (R-MAT) is a novel Transformer-based model based on the developed self-attention layer that achieves state-of-the-art or very competitive results across awide range of molecule property prediction tasks.
Our main contribution is Relative Molecule Attention Transformer (R-MAT): a novel Transformer-based model based on the developed self-attention layer that achieves state-of-the-art or very competitive results across awide range of molecule property prediction tasks.
arXiv Detail & Related papers (2021-10-12T09:05:26Z) - Advanced Graph and Sequence Neural Networks for Molecular Property
Prediction and Drug Discovery [53.00288162642151]
We develop MoleculeKit, a suite of comprehensive machine learning tools spanning different computational models and molecular representations.
Built on these representations, MoleculeKit includes both deep learning and traditional machine learning methods for graph and sequence data.
Results on both online and offline antibiotics discovery and molecular property prediction tasks show that MoleculeKit achieves consistent improvements over prior methods.
arXiv Detail & Related papers (2020-12-02T02:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.