From Molecules to Mixtures: Learning Representations of Olfactory Mixture Similarity using Inductive Biases
- URL: http://arxiv.org/abs/2501.16271v1
- Date: Mon, 27 Jan 2025 18:05:28 GMT
- Title: From Molecules to Mixtures: Learning Representations of Olfactory Mixture Similarity using Inductive Biases
- Authors: Gary Tom, Cher Tian Ser, Ella M. Rajaonson, Stanley Lo, Hyun Suk Park, Brian K. Lee, Benjamin Sanchez-Lengeling,
- Abstract summary: Olfaction -- how molecules are perceived as odors to humans -- remains poorly understood.
Recently, the principal odor map (POM) was introduced to digitize the olfactory properties of single compounds.
In this work, we introduce POMMix, an extension of the POM to represent mixtures.
- Score: 0.18846515534317265
- License:
- Abstract: Olfaction -- how molecules are perceived as odors to humans -- remains poorly understood. Recently, the principal odor map (POM) was introduced to digitize the olfactory properties of single compounds. However, smells in real life are not pure single molecules, but complex mixtures of molecules, whose representations remain relatively under-explored. In this work, we introduce POMMix, an extension of the POM to represent mixtures. Our representation builds upon the symmetries of the problem space in a hierarchical manner: (1) graph neural networks for building molecular embeddings, (2) attention mechanisms for aggregating molecular representations into mixture representations, and (3) cosine prediction heads to encode olfactory perceptual distance in the mixture embedding space. POMMix achieves state-of-the-art predictive performance across multiple datasets. We also evaluate the generalizability of the representation on multiple splits when applied to unseen molecules and mixture sizes. Our work advances the effort to digitize olfaction, and highlights the synergy of domain expertise and deep learning in crafting expressive representations in low-data regimes.
Related papers
- Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML)
KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism.
This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z) - FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM)
FARM is a foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs.
We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z) - SE3Set: Harnessing equivariant hypergraph neural networks for molecular representation learning [27.713870291922333]
We develop an SE(3) equivariant hypergraph neural network architecture tailored for advanced molecular representation learning.
SE3Set has shown performance on par with state-of-the-art (SOTA) models for small molecule datasets.
It excels on the MD22 dataset, achieving a notable improvement of approximately 20% in accuracy across all molecules.
arXiv Detail & Related papers (2024-05-26T10:43:16Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Multi-Modal Representation Learning for Molecular Property Prediction:
Sequence, Graph, Geometry [6.049566024728809]
Deep learning-based molecular property prediction has emerged as a solution to the resource-intensive nature of traditional methods.
In this paper, we propose a novel multi-modal representation learning model, called SGGRL, for molecular property prediction.
To ensure consistency across modalities, SGGRL is trained to maximize the similarity of representations for the same molecule while minimizing similarity for different molecules.
arXiv Detail & Related papers (2024-01-07T02:18:00Z) - MolSets: Molecular Graph Deep Sets Learning for Mixture Property
Modeling [14.067533753010897]
We present MolSets, a specialized machine learning model for molecular mixtures.
Representing individual molecules as graphs and their mixture as a set, MolSets aggregate it at the mixture level.
We demonstrate the efficacy of MolSets in predicting the conductivity of lithium battery electrolytes and highlight its benefits in virtual screening of the chemical space.
arXiv Detail & Related papers (2023-12-27T08:46:14Z) - Learning Harmonic Molecular Representations on Riemannian Manifold [18.49126496517951]
Molecular representation learning plays a crucial role in AI-assisted drug discovery research.
We propose a Harmonic Molecular Representation learning framework, which represents a molecule using the Laplace-Beltrami eigenfunctions of its molecular surface.
arXiv Detail & Related papers (2023-03-27T18:02:47Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Graph-based Molecular Representation Learning [59.06193431883431]
Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science.
Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning.
arXiv Detail & Related papers (2022-07-08T17:43:20Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Multi-Objective Molecule Generation using Interpretable Substructures [38.637412590671865]
Drug discovery aims to find novel compounds with specified chemical property profiles.
The goal is to learn to sample molecules in the intersection of multiple property constraints.
We propose to offset this complexity by composing molecules from a vocabulary of substructures that we call molecular rationales.
arXiv Detail & Related papers (2020-02-08T22:55:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.