Unsupervised Learning of Molecular Embeddings for Enhanced Clustering
and Emergent Properties for Chemical Compounds
- URL: http://arxiv.org/abs/2310.18367v1
- Date: Wed, 25 Oct 2023 18:00:24 GMT
- Title: Unsupervised Learning of Molecular Embeddings for Enhanced Clustering
and Emergent Properties for Chemical Compounds
- Authors: Jaiveer Gill, Ratul Chakraborty, Reetham Gubba, Amy Liu, Shrey Jain,
Chirag Iyer, Obaid Khwaja, Saurav Kumar
- Abstract summary: We introduce various methods to detect and cluster chemical compounds based on their SMILES data.
Our first method, analyzing the graphical structures of chemical compounds using embedding data, employs vector search to meet our threshold value.
We also used natural language description embeddings stored in a vector database with GPT3.5, which outperforms the base model.
- Score: 2.6803933204362336
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The detailed analysis of molecular structures and properties holds great
potential for drug development discovery through machine learning. Developing
an emergent property in the model to understand molecules would broaden the
horizons for development with a new computational tool. We introduce various
methods to detect and cluster chemical compounds based on their SMILES data.
Our first method, analyzing the graphical structures of chemical compounds
using embedding data, employs vector search to meet our threshold value. The
results yielded pronounced, concentrated clusters, and the method produced
favorable results in querying and understanding the compounds. We also used
natural language description embeddings stored in a vector database with
GPT3.5, which outperforms the base model. Thus, we introduce a similarity
search and clustering algorithm to aid in searching for and interacting with
molecules, enhancing efficiency in chemical exploration and enabling future
development of emergent properties in molecular property prediction models.
Related papers
- Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML)
KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism.
This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z) - Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis [18.940529282539842]
We construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules.
Our dataset offers significant physicochemical interpretability to guide model development and design.
We believe this dataset will serve as a more accurate and reliable benchmark for molecular representation learning.
arXiv Detail & Related papers (2024-06-13T02:50:23Z) - MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures [2.5563339057415218]
MolIG is a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures.
It amalgamates the strengths of both molecular representation forms.
It exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups.
arXiv Detail & Related papers (2023-11-28T10:28:35Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [68.32093648671496]
We introduce GODE, which accounts for the dual-level structure inherent in molecules.
Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph.
By pre-training two GNNs on different graph structures, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Graph neural networks for the prediction of molecular structure-property
relationships [59.11160990637615]
Graph neural networks (GNNs) are a novel machine learning method that directly work on the molecular graph.
GNNs allow to learn properties in an end-to-end fashion, thereby avoiding the need for informative descriptors.
We describe the fundamentals of GNNs and demonstrate the application of GNNs via two examples for molecular property prediction.
arXiv Detail & Related papers (2022-07-25T11:30:44Z) - Semi-Supervised GCN for learning Molecular Structure-Activity
Relationships [4.468952886990851]
We propose to train graph-to-graph neural network using semi-supervised learning for attributing structure-property relationships.
As final goal, our approach could represent a valuable tool to deal with problems such as activity cliffs, lead optimization and de-novo drug design.
arXiv Detail & Related papers (2022-01-25T09:09:43Z) - Improving VAE based molecular representations for compound property
prediction [0.0]
We propose a simple method to improve chemical property prediction performance of machine learning models.
We show the relation between the performance of property prediction models and the distance between property prediction dataset and the larger unlabeled dataset.
arXiv Detail & Related papers (2022-01-13T12:57:11Z) - Do Large Scale Molecular Language Representations Capture Important
Structural Information? [31.76876206167457]
We present molecular embeddings obtained by training an efficient transformer encoder model, referred to as MoLFormer.
Experiments show that the learned molecular representation performs competitively, when compared to graph-based and fingerprint-based supervised learning baselines.
arXiv Detail & Related papers (2021-06-17T14:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.