To Bin or not to Bin: Alternative Representations of Mass Spectra
- URL: http://arxiv.org/abs/2502.10851v1
- Date: Sat, 15 Feb 2025 16:52:36 GMT
- Title: To Bin or not to Bin: Alternative Representations of Mass Spectra
- Authors: Niek de Jonge, Justin J. J. van der Hooft, Daniel Probst,
- Abstract summary: We investigate two alternatives to the binning of mass spectra before down-stream machine learning tasks, namely, set-based and graph-based representations.
Comparing the two proposed representations to train a set transformer and a graph neural network on a regression task, we show that they both perform substantially better than a multilayer perceptron trained on binned data.
- Score: 0.0
- License:
- Abstract: Mass spectrometry, especially so-called tandem mass spectrometry, is commonly used to assess the chemical diversity of samples. The resulting mass fragmentation spectra are representations of molecules of which the structure may have not been determined. This poses the challenge of experimentally determining or computationally predicting molecular structures from mass spectra. An alternative option is to predict molecular properties or molecular similarity directly from spectra. Various methodologies have been proposed to embed mass spectra for further use in machine learning tasks. However, these methodologies require preprocessing of the spectra, which often includes binning or sub-sampling peaks with the main reasoning of creating uniform vector sizes and removing noise. Here, we investigate two alternatives to the binning of mass spectra before down-stream machine learning tasks, namely, set-based and graph-based representations. Comparing the two proposed representations to train a set transformer and a graph neural network on a regression task, respectively, we show that they both perform substantially better than a multilayer perceptron trained on binned data.
Related papers
- DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
DiffMS is a formula-restricted encoder-decoder generative network.
We develop a robust decoder that bridges latent embeddings and molecular structures.
Experiments show DiffMS outperforms existing models on $textitde novo$ molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z) - Unraveling Molecular Structure: A Multimodal Spectroscopic Dataset for Chemistry [0.1747623282473278]
This dataset comprises simulated $1$H-NMR, $13$C-NMR, HSQC-NMR, Infrared, and Mass spectra for 790k molecules extracted from chemical reactions in patent data.
We provide benchmarks for evaluating single-modality tasks such as structure elucidation, predicting the spectra for a target molecule, and functional group predictions.
arXiv Detail & Related papers (2024-07-04T12:52:48Z) - Datacube segmentation via Deep Spectral Clustering [76.48544221010424]
Extended Vision techniques often pose a challenge in their interpretation.
The huge dimensionality of data cube spectra poses a complex task in its statistical interpretation.
In this paper, we explore the possibility of applying unsupervised clustering methods in encoded space.
A statistical dimensional reduction is performed by an ad hoc trained (Variational) AutoEncoder, while the clustering process is performed by a (learnable) iterative K-Means clustering algorithm.
arXiv Detail & Related papers (2024-01-31T09:31:28Z) - Mass Spectra Prediction with Structural Motif-based Graph Neural
Networks [21.71309513265843]
MoMS-Net is a system that predicts mass spectra using the information derived from structural motifs and the implementation of Graph Neural Networks (GNNs)
We have tested our model across diverse mass spectra and have observed its superiority over other existing models.
arXiv Detail & Related papers (2023-06-28T10:33:57Z) - Prefix-Tree Decoding for Predicting Mass Spectra from Molecules [12.868704267691125]
We use a new intermediate strategy for predicting mass spectra from molecules by treating mass spectra as sets of molecular formulae, which are themselves multisets of atoms.
We show promising empirical results on mass spectra prediction tasks.
arXiv Detail & Related papers (2023-03-11T17:44:28Z) - Ensemble Spectral Prediction (ESP) Model for Metabolite Annotation [10.640447979978436]
Key challenge in metabolomics is annotating measured spectra from a biological sample with chemical identities.
We propose a novel machine learning model, Ensemble Spectral Prediction (ESP), for metabolite annotation.
arXiv Detail & Related papers (2022-03-25T17:05:41Z) - Unsupervised Machine Learning for Exploratory Data Analysis of Exoplanet
Transmission Spectra [68.8204255655161]
We focus on unsupervised techniques for analyzing spectral data from transiting exoplanets.
We show that there is a high degree of correlation in the spectral data, which calls for appropriate low-dimensional representations.
We uncover interesting structures in the principal component basis, namely, well-defined branches corresponding to different chemical regimes.
arXiv Detail & Related papers (2022-01-07T22:26:33Z) - Gaussian Process Regression for Absorption Spectra Analysis of Molecular
Dimers [68.8204255655161]
We discuss an approach based on a machine learning technique, where the parameters for the numerical calculations are chosen from Gaussian Process Regression (GPR)
This approach does not only quickly converge to an optimal parameter set, but in addition provides information about the complete parameter space.
We find that indeed the GPR gives reliable results which are in agreement with direct calculations of these parameters using quantum chemical methods.
arXiv Detail & Related papers (2021-12-14T17:46:45Z) - Unsupervised Spectral Unmixing For Telluric Correction Using A Neural
Network Autoencoder [58.720142291102135]
We present a neural network autoencoder approach for extracting a telluric transmission spectrum from a large set of high-precision observed solar spectra from the HARPS-N radial velocity spectrograph.
arXiv Detail & Related papers (2021-11-17T12:54:48Z) - MassFormer: Tandem Mass Spectrum Prediction for Small Molecules using
Graph Transformers [3.2951121243459522]
Tandem mass spectra capture fragmentation patterns that provide key structural information about a molecule.
For over seventy years, spectrum prediction has remained a key challenge in the field.
We propose a new model, MassFormer, for accurately predicting tandem mass spectra.
arXiv Detail & Related papers (2021-11-08T20:55:15Z) - Spectral Analysis Network for Deep Representation Learning and Image
Clustering [53.415803942270685]
This paper proposes a new network structure for unsupervised deep representation learning based on spectral analysis.
It can identify the local similarities among images in patch level and thus more robust against occlusion.
It can learn more clustering-friendly representations and is capable to reveal the deep correlations among data samples.
arXiv Detail & Related papers (2020-09-11T05:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.