A biologically-inspired evaluation of molecular generative machine
learning
- URL: http://arxiv.org/abs/2208.09658v1
- Date: Sat, 20 Aug 2022 11:01:10 GMT
- Title: A biologically-inspired evaluation of molecular generative machine
learning
- Authors: Elizaveta Vinogradova, Abay Artykbayev, Alisher Amanatay, Mukhamejan
Karatayev, Maxim Mametkulov, Albina Li, Anuar Suleimenov, Abylay Salimzhanov,
Karina Pats, Rustam Zhumagambetov, Ferdinand Moln\'ar, Vsevolod Peshkov,
Siamac Fazli
- Abstract summary: A novel biologically-inspired benchmark for the evaluation of molecular generative models is proposed.
We propose a recreation metric, apply drug-target affinity prediction and molecular docking as complementary techniques for the evaluation of generative outputs.
- Score: 17.623886600638716
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While generative models have recently become ubiquitous in many scientific
areas, less attention has been paid to their evaluation. For molecular
generative models, the state-of-the-art examines their output in isolation or
in relation to its input. However, their biological and functional properties,
such as ligand-target interaction is not being addressed. In this study, a
novel biologically-inspired benchmark for the evaluation of molecular
generative models is proposed. Specifically, three diverse reference datasets
are designed and a set of metrics are introduced which are directly relevant to
the drug discovery process. In particular we propose a recreation metric, apply
drug-target affinity prediction and molecular docking as complementary
techniques for the evaluation of generative outputs. While all three metrics
show consistent results across the tested generative models, a more detailed
comparison of drug-target affinity binding and molecular docking scores
revealed that unimodal predictiors can lead to erroneous conclusions about
target binding on a molecular level and a multi-modal approach is thus
preferrable. The key advantage of this framework is that it incorporates prior
physico-chemical domain knowledge into the benchmarking process by focusing
explicitly on ligand-target interactions and thus creating a highly efficient
tool not only for evaluating molecular generative outputs in particular, but
also for enriching the drug discovery process in general.
Related papers
- Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization [147.7899503829411]
AliDiff is a novel framework to align pretrained target diffusion models with preferred functional properties.
It can generate molecules with state-of-the-art binding energies with up to -7.07 Avg. Vina Score.
arXiv Detail & Related papers (2024-07-01T06:10:29Z) - MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis [18.940529282539842]
We construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules.
Our dataset offers significant physicochemical interpretability to guide model development and design.
We believe this dataset will serve as a more accurate and reliable benchmark for molecular representation learning.
arXiv Detail & Related papers (2024-06-13T02:50:23Z) - TAGMol: Target-Aware Gradient-guided Molecule Generation [19.977071499171903]
3D generative models have shown significant promise in structure-based drug design (SBDD)
We decouple the problem into molecular generation and property prediction.
The latter synergistically guides the diffusion sampling process, facilitating guided diffusion and resulting in the creation of meaningful molecules with the desired properties.
We call this guided molecular generation process as TAGMol.
arXiv Detail & Related papers (2024-06-03T14:43:54Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models [71.39421638547164]
We propose to detect OOD molecules by adopting an auxiliary diffusion model-based framework, which compares similarities between input molecules and reconstructed graphs.
Due to the generative bias towards reconstructing ID training samples, the similarity scores of OOD molecules will be much lower to facilitate detection.
Our research pioneers an approach of Prototypical Graph Reconstruction for Molecular OOD Detection, dubbed as PGR-MOOD and hinges on three innovations.
arXiv Detail & Related papers (2024-04-24T03:25:53Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Target-aware Molecular Graph Generation [37.937378787812264]
We propose SiamFlow, which forces the flow to fit the distribution of target sequence embeddings in latent space.
Specifically, we employ an alignment loss and a uniform loss to bring target sequence embeddings and drug graph embeddings into agreements.
Experiments quantitatively show that our proposed method learns meaningful representations in the latent space toward the target-aware molecular graph generation.
arXiv Detail & Related papers (2022-02-10T04:31:14Z) - Improved Drug-target Interaction Prediction with Intermolecular Graph
Transformer [98.8319016075089]
We propose a novel approach to model intermolecular information with a three-way Transformer-based architecture.
Intermolecular Graph Transformer (IGT) outperforms state-of-the-art approaches by 9.1% and 20.5% over the second best for binding activity and binding pose prediction respectively.
IGT exhibits promising drug screening ability against SARS-CoV-2 by identifying 83.1% active drugs that have been validated by wet-lab experiments with near-native predicted binding poses.
arXiv Detail & Related papers (2021-10-14T13:28:02Z) - Analysis of training and seed bias in small molecules generated with a
conditional graph-based variational autoencoder -- Insights for practical
AI-driven molecule generation [0.0]
We analyze the impact of seed and training bias on the output of an activity-conditioned graph-based variational autoencoder (VAE)
Our graph-based generative model is shown to excel in producing desired conditioned activities and favorable unconditioned physical properties in generated molecules.
arXiv Detail & Related papers (2021-07-19T16:00:05Z) - Multi-View Self-Attention for Interpretable Drug-Target Interaction
Prediction [4.307720252429733]
In machine learning approaches, the numerical representation of molecules is critical to the performance of the model.
We propose a self-attention-based multi-view representation learning approach for modeling drug-target interactions.
arXiv Detail & Related papers (2020-05-01T14:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.