NEBULA: Neural Empirical Bayes Under Latent Representations for Efficient and Controllable Design of Molecular Libraries
- URL: http://arxiv.org/abs/2407.03428v1
- Date: Wed, 3 Jul 2024 18:10:43 GMT
- Title: NEBULA: Neural Empirical Bayes Under Latent Representations for Efficient and Controllable Design of Molecular Libraries
- Authors: Ewa M. Nowara, Pedro O. Pinheiro, Sai Pooja Mahajan, Omar Mahmood, Andrew Martin Watkins, Saeed Saremi, Michael Maser,
- Abstract summary: We present NEBULA, the first latent 3D generative model for scalable generation of large molecular libraries around a seed compound of interest.
NEBULA generates large molecular libraries nearly an order of magnitude faster than existing methods without sacrificing sample quality.
We expect the approach herein to be highly enabling for machine learning-based drug discovery.
- Score: 5.350316354464512
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present NEBULA, the first latent 3D generative model for scalable generation of large molecular libraries around a seed compound of interest. Such libraries are crucial for scientific discovery, but it remains challenging to generate large numbers of high quality samples efficiently. 3D-voxel-based methods have recently shown great promise for generating high quality samples de novo from random noise (Pinheiro et al., 2023). However, sampling in 3D-voxel space is computationally expensive and use in library generation is prohibitively slow. Here, we instead perform neural empirical Bayes sampling (Saremi & Hyvarinen, 2019) in the learned latent space of a vector-quantized variational autoencoder. NEBULA generates large molecular libraries nearly an order of magnitude faster than existing methods without sacrificing sample quality. Moreover, NEBULA generalizes better to unseen drug-like molecules, as demonstrated on two public datasets and multiple recently released drugs. We expect the approach herein to be highly enabling for machine learning-based drug discovery. The code is available at https://github.com/prescient-design/nebula
Related papers
- Score-based 3D molecule generation with neural fields [10.0889807546726]
We introduce a new representation for 3D molecules based on their continuous atomic density fields.
We propose a new model based on walk-jump sampling for unconditional 3D molecule generation in the continuous space using neural fields.
Our model, FuncMol, encodes molecular fields into latent codes using a conditional neural field.
FuncMol performs all-atom generation of 3D molecules without assumptions on the molecular structure and scales well with the size of molecules, unlike most approaches.
arXiv Detail & Related papers (2025-01-15T01:10:59Z) - Large Language Monkeys: Scaling Inference Compute with Repeated Sampling [81.34900892130929]
We explore inference compute as another axis for scaling by increasing the number of generated samples.
In domains like coding and formal proofs, where all answers can be automatically verified, these increases in coverage directly translate into improved performance.
We find that identifying correct samples out of many generations remains an important direction for future research in domains without automatic verifiers.
arXiv Detail & Related papers (2024-07-31T17:57:25Z) - RGFN: Synthesizable Molecular Generation Using GFlowNets [51.33672611338754]
We propose Reaction-GFlowNet, an extension of the GFlowNet framework that operates directly in the space of chemical reactions.
RGFN allows out-of-the-box synthesizability while maintaining comparable quality of generated candidates.
We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.
arXiv Detail & Related papers (2024-06-01T13:11:11Z) - Structure-based drug design by denoising voxel grids [5.9535699822923]
We present VoxBind, a new score-based generative model for 3D molecules conditioned on protein structures.
Our approach represents molecules as 3D atomic density grids and leverages a 3D voxel-denoising network for learning and generation.
arXiv Detail & Related papers (2024-05-07T02:48:15Z) - 3D molecule generation by denoising voxel grids [5.50581548670289]
We propose a new score-based approach to generate 3D molecules represented as atomic densities on regular grids.
We train a denoising neural network that learns to map from a smooth distribution of noisy molecules to the distribution of real molecules.
Our experiments show that VoxMol captures the distribution of drug-like molecules better than state of the art, while being faster to generate samples.
arXiv Detail & Related papers (2023-06-13T00:38:51Z) - Semi-supervised 3D Object Detection with Proficient Teachers [114.54835359657707]
Dominated point cloud-based 3D object detectors in autonomous driving scenarios rely heavily on the huge amount of accurately labeled samples.
Pseudo-Labeling methodology is commonly used for SSL frameworks, however, the low-quality predictions from the teacher model have seriously limited its performance.
We propose a new Pseudo-Labeling framework for semi-supervised 3D object detection, by enhancing the teacher model to a proficient one with several necessary designs.
arXiv Detail & Related papers (2022-07-26T04:54:03Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - NeuSample: Neural Sample Field for Efficient View Synthesis [129.10351459066501]
We propose a lightweight module which names a neural sample field.
The proposed sample field maps rays into sample distributions, which can be transformed into point coordinates and fed into radiance fields for volume rendering.
We show that NeuSample achieves better rendering quality than NeRF while enjoying a faster inference speed.
arXiv Detail & Related papers (2021-11-30T16:43:49Z) - Fully Spiking Variational Autoencoder [66.58310094608002]
Spiking neural networks (SNNs) can be run on neuromorphic devices with ultra-high speed and ultra-low energy consumption.
In this study, we build a variational autoencoder (VAE) with SNN to enable image generation.
arXiv Detail & Related papers (2021-09-26T06:10:14Z) - MoleHD: Ultra-Low-Cost Drug Discovery using Hyperdimensional Computing [2.7462881838152913]
We present MoleHD, a method based on brain-inspired hyperdimensional computing (HDC) for molecular property prediction.
MoleHD achieves highest ROC-AUC score on random and scaffold splits on average across 3 datasets.
To the best of our knowledge, this is the first HDC-based method for drug discovery.
arXiv Detail & Related papers (2021-06-05T13:33:21Z) - Generate Novel Molecules With Target Properties Using Conditional
Generative Models [0.0]
We present a novel neural network for generating small molecules similar to the ones in the training set.
Our network outperforms previous methods using Molecular weight, LogP and Quantitative Estimation of Drug-likeness as the evaluation metrics.
arXiv Detail & Related papers (2020-09-15T18:59:26Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.