Mol-PECO: a deep learning model to predict human olfactory perception
from molecular structures
- URL: http://arxiv.org/abs/2305.12424v1
- Date: Sun, 21 May 2023 10:44:02 GMT
- Title: Mol-PECO: a deep learning model to predict human olfactory perception
from molecular structures
- Authors: Mengji Zhang, Yusuke Hiki, Akira Funahashi, Tetsuya J. Kobayashi
- Abstract summary: We develop a deep learning model called Mol-PECO to predict olfactory perception from molecular structures.
With a comprehensive dataset of 8,503 molecules, Mol-PECO directly achieves an area-under-the-receiver-operating-characteristic (AUROC) of 0.813 in 118 odor descriptors.
Our work may promote the understanding and decoding of the olfactory sense and mechanisms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While visual and auditory information conveyed by wavelength of light and
frequency of sound have been decoded, predicting olfactory information encoded
by the combination of odorants remains challenging due to the unknown and
potentially discontinuous perceptual space of smells and odorants. Herein, we
develop a deep learning model called Mol-PECO (Molecular Representation by
Positional Encoding of Coulomb Matrix) to predict olfactory perception from
molecular structures. Mol-PECO updates the learned atom embedding by
directional graph convolutional networks (GCN), which model the Laplacian
eigenfunctions as positional encoding, and Coulomb matrix, which encodes atomic
coordinates and charges. With a comprehensive dataset of 8,503 molecules,
Mol-PECO directly achieves an area-under-the-receiver-operating-characteristic
(AUROC) of 0.813 in 118 odor descriptors, superior to the machine learning of
molecular fingerprints (AUROC of 0.761) and GCN of adjacency matrix (AUROC of
0.678). The learned embeddings by Mol-PECO also capture a meaningful odor space
with global clustering of descriptors and local retrieval of similar odorants.
Our work may promote the understanding and decoding of the olfactory sense and
mechanisms.
Related papers
- De novo molecular structure elucidation from mass spectra via flow matching [5.274388013166468]
We develop MSFlow, a two-stage encoder-decoder flow-matching generative model that achieves state-of-the-art performance on the structure elucidation task for small molecules.<n>MSFlow can accurately translate up to 45 percent of molecular mass spectra into their corresponding molecular representations - an improvement of up to fourteen-fold over the current state-of-the-art.
arXiv Detail & Related papers (2026-02-23T14:52:53Z) - How well can off-the-shelf LLMs elucidate molecular structures from mass spectra using chain-of-thought reasoning? [51.286853421822705]
Large language models (LLMs) have shown promise for reasoning-intensive scientific tasks, but their capability for chemical interpretation is still unclear.<n>We introduce a Chain-of-Thought (CoT) prompting framework and benchmark that evaluate how LLMs reason about mass spectral data to predict molecular structures.<n>Our evaluation across metrics of SMILES validity, formula consistency, and structural similarity reveals that while LLMs can produce syntactically valid and partially plausible structures, they fail to achieve chemical accuracy or link reasoning to correct molecular predictions.
arXiv Detail & Related papers (2026-01-09T20:08:42Z) - QSAR-Guided Generative Framework for the Discovery of Synthetically Viable Odorants [0.39318191265352187]
Generative artificial intelligence offers a promising approach for textitde novo molecular design.<n>We present a framework combining a variational autoencoder (VAE) with a quantitative structure-activity relationship (QSAR) model to generate novel odorants.
arXiv Detail & Related papers (2025-12-28T21:06:01Z) - Aligned Manifold Property and Topology Point Clouds for Learning Molecular Properties [55.2480439325792]
This work introduces AMPTCR, a molecular surface representation that combines local quantum-derived scalar fields and custom topological descriptors within an aligned point cloud format.<n>For molecular weight, results confirm that AMPTCR encodes physically meaningful data, with a validation R2 of 0.87.<n>In the bacterial inhibition task, AMPTCR enables both classification and direct regression of E. coli inhibition values.
arXiv Detail & Related papers (2025-07-22T04:35:50Z) - Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML)
KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism.
This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z) - DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
DiffMS is a formula-restricted encoder-decoder generative network.
We develop a robust decoder that bridges latent embeddings and molecular structures.
Experiments show DiffMS outperforms existing models on $textitde novo$ molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z) - Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM)
FARM is a foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs.
We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z) - Molecular Identification via Molecular Fingerprint extraction from Atomic Force Microscopy images [0.0]
Deep learning models can retrieve chemical and structural information encoded in a 3D stack of constant-height HR--AFM images.
In this work, we overcome their limitations by using a well-established description of the molecular structure in terms of topological fingerprints.
We show that it is possible to complement the fingerprint-based virtual screening with global information provided by another DL model.
arXiv Detail & Related papers (2024-05-07T13:47:35Z) - MolTC: Towards Molecular Relational Modeling In Language Models [28.960416816491392]
We propose a novel framework for Molecular inTeraction prediction following Chain-of-Thought (CoT) theory termed MolTC.
Our experiments, conducted across various datasets involving over 4,000,000 molecular pairs, exhibit the superiority of our method over current GNN and LLM-based baselines.
arXiv Detail & Related papers (2024-02-06T07:51:56Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Multiresolution Graph Transformers and Wavelet Positional Encoding for
Learning Hierarchical Structures [6.875312133832078]
We propose Multiresolution Graph Transformers (MGT), the first graph transformer architecture that can learn to represent large molecules at multiple scales.
MGT can learn to produce representations for the atoms and group them into meaningful functional groups or repeating units.
Our proposed model achieves results on two macromolecule datasets consisting of polymers and peptides, and one drug-like molecule dataset.
arXiv Detail & Related papers (2023-02-17T01:32:44Z) - Building Open Knowledge Graph for Metal-Organic Frameworks (MOF-KG):
Challenges and Case Studies [63.61566811532431]
Metal-Organic Frameworks (MOFs) have great potential to revolutionize applications such as gas storage, molecular separations, chemical sensing, crystalline and drug delivery.
The Cambridge Structural Database (CSD) reports 10,636 synthesized MOF crystals which in addition contains ca. 114,373 MOF-like structures.
In this demo paper, we describe our effort on leveraging knowledge graph methods to facilitate MOF prediction, discovery, and synthesis.
arXiv Detail & Related papers (2022-07-10T16:41:11Z) - Unsupervised Spectral Unmixing For Telluric Correction Using A Neural
Network Autoencoder [58.720142291102135]
We present a neural network autoencoder approach for extracting a telluric transmission spectrum from a large set of high-precision observed solar spectra from the HARPS-N radial velocity spectrograph.
arXiv Detail & Related papers (2021-11-17T12:54:48Z) - Chemical-Reaction-Aware Molecule Representation Learning [88.79052749877334]
We propose using chemical reactions to assist learning molecule representation.
Our approach is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings.
Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks.
arXiv Detail & Related papers (2021-09-21T00:08:43Z) - Do Large Scale Molecular Language Representations Capture Important
Structural Information? [31.76876206167457]
We present molecular embeddings obtained by training an efficient transformer encoder model, referred to as MoLFormer.
Experiments show that the learned molecular representation performs competitively, when compared to graph-based and fingerprint-based supervised learning baselines.
arXiv Detail & Related papers (2021-06-17T14:33:55Z) - Knowledge-aware Contrastive Molecular Graph Learning [5.08771973600915]
We propose Contrastive Knowledge-aware GNN (CKGNN) for self-supervised molecular representation learning.
We explicitly encode domain knowledge via knowledge-aware molecular encoder under the contrastive learning framework.
Experiments on 8 public datasets demonstrate the effectiveness of our model with a 6% absolute improvement on average.
arXiv Detail & Related papers (2021-03-24T08:55:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.