Exploring Molecular Odor Taxonomies for Structure-based Odor Predictions using Machine Learning
- URL: http://arxiv.org/abs/2508.09217v1
- Date: Mon, 11 Aug 2025 18:56:50 GMT
- Title: Exploring Molecular Odor Taxonomies for Structure-based Odor Predictions using Machine Learning
- Authors: Akshay Sajan, Stijn Sluis, Reza Haydarlou, Sanne Abeln, Pasquale Lisena, Raphael Troncy, Caro Verbeek, Inger Leemans, Halima Mouhib,
- Abstract summary: We show that the predictive performance of machine learning models for structure-based odor predictions can be improved using both, an expert and a data-driven odor taxonomy.<n>The expert taxonomy is based on semantic and perceptual similarities, while the data-driven taxonomy is based on clustering co-occurrence patterns of odor descriptors.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: One of the key challenges to predict odor from molecular structure is unarguably our limited understanding of the odor space and the complexity of the underlying structure-odor relationships. Here, we show that the predictive performance of machine learning models for structure-based odor predictions can be improved using both, an expert and a data-driven odor taxonomy. The expert taxonomy is based on semantic and perceptual similarities, while the data-driven taxonomy is based on clustering co-occurrence patterns of odor descriptors directly from the prepared dataset. Both taxonomies improve the predictions of different machine learning models and outperform random groupings of descriptors that do not reflect existing relations between odor descriptors. We assess the quality of both taxonomies through their predictive performance across different odor classes and perform an in-depth error analysis highlighting the complexity of odor-structure relationships and identifying potential inconsistencies within the taxonomies by showcasing pear odorants used in perfumery. The data-driven taxonomy allows us to critically evaluate our expert taxonomy and better understand the molecular odor space. Both taxonomies as well as a full dataset are made available to the community, providing a stepping stone for a future community-driven exploration of the molecular basis of smell. In addition, we provide a detailed multi-layer expert taxonomy including a total of 777 different descriptors from the Pyrfume repository.
Related papers
- QSAR-Guided Generative Framework for the Discovery of Synthetically Viable Odorants [0.39318191265352187]
Generative artificial intelligence offers a promising approach for textitde novo molecular design.<n>We present a framework combining a variational autoencoder (VAE) with a quantitative structure-activity relationship (QSAR) model to generate novel odorants.
arXiv Detail & Related papers (2025-12-28T21:06:01Z) - DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
We present DiffMS, a formula-restricted encoder-decoder generative network that achieves state-of-the-art performance on this task.<n>To develop a robust decoder that bridges latent embeddings and molecular structures, we pretrain the diffusion decoder with fingerprint-structure pairs.<n>Experiments on established benchmarks show that DiffMS outperforms existing models on de novo molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z) - Molecular Odor Prediction Based on Multi-Feature Graph Attention Networks [11.912107063761939]
Quantitative Structure-Odor Relationship task involves predicting associations between molecular structures and their corresponding odors.<n>We propose a method for QSOR, utilizing Graph Attention Networks to model molecular structures and capture both local and global features.<n>Our approach demonstrates clear advantages in QSOR prediction tasks, offering valuable insights into the application of deep learning in cheminformatics.
arXiv Detail & Related papers (2025-02-03T15:11:24Z) - Molecular Odor Prediction with Harmonic Modulated Feature Mapping and Chemically-Informed Loss [11.654144823736143]
We introduce a novel feature mapping method and a molecular ensemble optimization loss function.<n>Our method significantly can improve the accuracy of molecular odor prediction across various deep learning models.
arXiv Detail & Related papers (2025-02-03T12:17:51Z) - Tree-based variational inference for Poisson log-normal models [47.82745603191512]
hierarchical trees are often used to organize entities based on proximity criteria.<n>Current count-data models do not leverage this structured information.<n>We introduce the PLN-Tree model as an extension of the PLN model for modeling hierarchical count data.
arXiv Detail & Related papers (2024-06-25T08:24:35Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Molecule Generation and Optimization for Efficient Fragrance Creation [0.0]
This research introduces a Machine Learning-centric approach to replicate olfactory experiences.
Key contributions encompass a hybrid model connecting perfume molecular structure to human olfactory perception.
The methodology is validated by reproducing two distinct olfactory experiences using available experimental data.
arXiv Detail & Related papers (2024-02-19T13:32:30Z) - Olfactory Label Prediction on Aroma-Chemical Pairs [0.2749898166276853]
We present graph neural network models capable of accurately predicting the odor qualities arising from blends of aroma-chemicals.
In this paper, we apply both existing and novel approaches to a dataset we gathered consisting of labeled pairs of molecules.
arXiv Detail & Related papers (2023-12-26T17:18:09Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Federated Learning of Molecular Properties in a Heterogeneous Setting [79.00211946597845]
We introduce federated heterogeneous molecular learning to address these challenges.
Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients.
FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.
arXiv Detail & Related papers (2021-09-15T12:49:13Z) - Octet: Online Catalog Taxonomy Enrichment with Self-Supervision [67.26804972901952]
We present a self-supervised end-to-end framework, Octet for Online Catalog EnrichmenT.
We propose to train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure.
Octet enriches an online catalog in production to 2 times larger in the open-world evaluation.
arXiv Detail & Related papers (2020-06-18T04:53:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.