CHA2: CHemistry Aware Convex Hull Autoencoder Towards Inverse Molecular
Design
- URL: http://arxiv.org/abs/2302.11000v1
- Date: Tue, 21 Feb 2023 21:05:31 GMT
- Title: CHA2: CHemistry Aware Convex Hull Autoencoder Towards Inverse Molecular
Design
- Authors: Mohammad Sajjad Ghaemi, Hang Hu, Anguang Hu, Hsu Kiang Ooi
- Abstract summary: It is impossible to explore the entire search space comprehensively to exploit de novo structures with properties of interest.
To address this challenge, reducing the intractable search space into a lower-dimensional latent volume helps examine molecular candidates more feasibly.
We propose using a convex hall surrounding the top molecules in terms of high QEDs to ensnare a tight subspace in the latent representation as an efficient way to reveal novel molecules with high QEDs.
- Score: 2.169755083801688
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Optimizing molecular design and discovering novel chemical structures to meet
certain objectives, such as quantitative estimates of the drug-likeness score
(QEDs), is NP-hard due to the vast combinatorial design space of discrete
molecular structures, which makes it near impossible to explore the entire
search space comprehensively to exploit de novo structures with properties of
interest. To address this challenge, reducing the intractable search space into
a lower-dimensional latent volume helps examine molecular candidates more
feasibly via inverse design. Autoencoders are suitable deep learning
techniques, equipped with an encoder that reduces the discrete molecular
structure into a latent space and a decoder that inverts the search space back
to the molecular design. The continuous property of the latent space, which
characterizes the discrete chemical structures, provides a flexible
representation for inverse design in order to discover novel molecules.
However, exploring this latent space requires certain insights to generate new
structures. We propose using a convex hall surrounding the top molecules in
terms of high QEDs to ensnare a tight subspace in the latent representation as
an efficient way to reveal novel molecules with high QEDs. We demonstrate the
effectiveness of our suggested method by using the QM9 as a training dataset
along with the Self- Referencing Embedded Strings (SELFIES) representation to
calibrate the autoencoder in order to carry out the Inverse molecular design
that leads to unfold novel chemical structure.
Related papers
- GraphXForm: Graph transformer for computer-aided molecular design with application to extraction [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds and then fine-tuned.
We evaluate it on two solvent design tasks for liquid-liquid extraction, showing that it outperforms four state-of-the-art molecular design techniques.
arXiv Detail & Related papers (2024-11-03T19:45:15Z) - Active Deep Kernel Learning of Molecular Functionalities: Realizing
Dynamic Structural Embeddings [0.26716003713321473]
This paper explores an approach for active learning in molecular discovery using Deep Kernel Learning (DKL)
DKL offers a more holistic perspective by correlating structure with properties, creating latent spaces that prioritize molecular functionality.
The formation of exclusion regions around certain compounds indicates unexplored areas with potential for groundbreaking functionalities.
arXiv Detail & Related papers (2024-03-02T15:34:31Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Domain-Agnostic Molecular Generation with Chemical Feedback [44.063584808910896]
MolGen is a pre-trained molecular language model tailored specifically for molecule generation.
It internalizes structural and grammatical insights through the reconstruction of over 100 million molecular SELFIES.
Our chemical feedback paradigm steers the model away from molecular hallucinations, ensuring alignment between the model's estimated probabilities and real-world chemical preferences.
arXiv Detail & Related papers (2023-01-26T17:52:56Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Scalable Fragment-Based 3D Molecular Design with Reinforcement Learning [68.8204255655161]
We introduce a novel framework for scalable 3D design that uses a hierarchical agent to build molecules.
In a variety of experiments, we show that our agent, guided only by energy considerations, can efficiently learn to produce molecules with over 100 atoms.
arXiv Detail & Related papers (2022-02-01T18:54:24Z) - Characterizing the Latent Space of Molecular Deep Generative Models with
Persistent Homology Metrics [21.95240820041655]
Variational Autos (VAEs) are generative models in which encoder-decoder network pairs are trained to reconstruct training data distributions.
We propose a method for measuring how well the latent space of deep generative models is able to encode structural and chemical features.
arXiv Detail & Related papers (2020-10-18T13:33:02Z) - ChemoVerse: Manifold traversal of latent spaces for novel molecule
discovery [0.7742297876120561]
It is essential to identify molecular structures with the desired chemical properties.
Recent advances in generative models using neural networks and machine learning are being widely used to design virtual libraries of drug-like compounds.
arXiv Detail & Related papers (2020-09-29T12:11:40Z) - Learning To Navigate The Synthetically Accessible Chemical Space Using
Reinforcement Learning [75.95376096628135]
We propose a novel forward synthesis framework powered by reinforcement learning (RL) for de novo drug design.
In this setup, the agent learns to navigate through the immense synthetically accessible chemical space.
We describe how the end-to-end training in this study represents an important paradigm in radically expanding the synthesizable chemical space.
arXiv Detail & Related papers (2020-04-26T21:40:03Z) - Reinforcement Learning for Molecular Design Guided by Quantum Mechanics [10.112779201155005]
We present a novel RL formulation for molecular design in coordinates, thereby extending the class of molecules that can be built.
Our reward function is directly based on fundamental physical properties such as the energy, which we approximate via fast quantum-chemical methods.
In our experiments, we show that our agent can efficiently learn to solve these tasks from scratch by working in a translation and rotation invariant state-action space.
arXiv Detail & Related papers (2020-02-18T16:43:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.