Using Molecular Embeddings in QSAR Modeling: Does it Make a Difference?
- URL: http://arxiv.org/abs/2104.02604v2
- Date: Wed, 28 Jul 2021 15:30:22 GMT
- Title: Using Molecular Embeddings in QSAR Modeling: Does it Make a Difference?
- Authors: Mar\'ia Virginia Sabando, Ignacio Ponzoni, Evangelos E. Milios, Axel
J. Soto
- Abstract summary: We reviewed the literature on methods for molecular embeddings and reproduced three unsupervised and two supervised molecular embedding techniques.
We compared these five methods concerning their performance in QSAR scenarios using different classification and regression datasets.
Our results highlight the need for conducting a careful comparison and analysis of the different embedding techniques prior to using them in drug design tasks.
- Score: 0.6299766708197883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the consolidation of deep learning in drug discovery, several novel
algorithms for learning molecular representations have been proposed. Despite
the interest of the community in developing new methods for learning molecular
embeddings and their theoretical benefits, comparing molecular embeddings with
each other and with traditional representations is not straightforward, which
in turn hinders the process of choosing a suitable representation for QSAR
modeling. A reason behind this issue is the difficulty of conducting a fair and
thorough comparison of the different existing embedding approaches, which
requires numerous experiments on various datasets and training scenarios. To
close this gap, we reviewed the literature on methods for molecular embeddings
and reproduced three unsupervised and two supervised molecular embedding
techniques recently proposed in the literature. We compared these five methods
concerning their performance in QSAR scenarios using different classification
and regression datasets. We also compared these representations to traditional
molecular representations, namely molecular descriptors and fingerprints. As
opposed to the expected outcome, our experimental setup consisting of over
25,000 trained models and statistical tests revealed that the predictive
performance using molecular embeddings did not significantly surpass that of
traditional representations. While supervised embeddings yielded competitive
results compared to those using traditional molecular representations,
unsupervised embeddings tended to perform worse than traditional
representations. Our results highlight the need for conducting a careful
comparison and analysis of the different embedding techniques prior to using
them in drug design tasks, and motivate a discussion about the potential of
molecular embeddings in computer-aided drug design.
Related papers
- Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Contrastive Dual-Interaction Graph Neural Network for Molecular Property Prediction [0.0]
We introduce DIG-Mol, a novel self-supervised graph neural network framework for molecular property prediction.
DIG-Mol integrates a momentum distillation network with two interconnected networks to efficiently improve molecular characterization.
We have established DIG-Mol's state-of-the-art performance through extensive experimental evaluation in a variety of molecular property prediction tasks.
arXiv Detail & Related papers (2024-05-04T10:09:27Z) - Multi-Modal Representation Learning for Molecular Property Prediction:
Sequence, Graph, Geometry [6.049566024728809]
Deep learning-based molecular property prediction has emerged as a solution to the resource-intensive nature of traditional methods.
In this paper, we propose a novel multi-modal representation learning model, called SGGRL, for molecular property prediction.
To ensure consistency across modalities, SGGRL is trained to maximize the similarity of representations for the same molecule while minimizing similarity for different molecules.
arXiv Detail & Related papers (2024-01-07T02:18:00Z) - Learning Invariant Molecular Representation in Latent Discrete Space [52.13724532622099]
We propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts.
Our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts.
arXiv Detail & Related papers (2023-10-22T04:06:44Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Molecular Property Prediction by Semantic-invariant Contrastive Learning [26.19431931932982]
We develop a Fragment-based Semantic-Invariant Contrastive Learning model based on this view generation method for molecular property prediction.
With the least number of pre-training samples, FraSICL can achieve state-of-the-art performance, compared with major existing counterpart models.
arXiv Detail & Related papers (2023-03-13T07:32:37Z) - Improving Molecular Pretraining with Complementary Featurizations [20.86159731100242]
Molecular pretraining is a paradigm to solve a variety of tasks in computational chemistry and drug discovery.
We show that different featurization techniques convey chemical information differently.
We propose a simple and effective MOlecular pretraining framework with COmplementary featurizations (MOCO)
arXiv Detail & Related papers (2022-09-29T21:11:09Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Improving Molecular Contrastive Learning via Faulty Negative Mitigation
and Decomposed Fragment Contrast [17.142976840521264]
We propose iMolCLR: improvement of Molecular Contrastive Learning of Representations with graph neural networks (GNNs)
Experiments have shown that the proposed strategies significantly improve the performance of GNN models.
iMolCLR intrinsically embeds scaffolds and functional groups that can reason molecule similarities.
arXiv Detail & Related papers (2022-02-18T18:33:27Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z) - ASGN: An Active Semi-supervised Graph Neural Network for Molecular
Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules.
In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution.
At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.