Unmasking Airborne Threats: Guided-Transformers for Portable Aerosol Mass Spectrometry
- URL: http://arxiv.org/abs/2511.17446v1
- Date: Fri, 21 Nov 2025 17:45:00 GMT
- Title: Unmasking Airborne Threats: Guided-Transformers for Portable Aerosol Mass Spectrometry
- Authors: Kyle M. Regan, Michael McLoughlin, Wayne A. Bryden, Gonzalo R. Arce,
- Abstract summary: Matrix Assisted Laser Desorption/Ionization Mass Spectrometry (MALDI-MS) is a cornerstone in biomolecular analysis, offering precise identification of pathogens through unique mass spectral signatures.<n>Yet, its reliance on labor-intensive sample preparation and multi-shot spectral averaging restricts its use to laboratory settings, rendering it impractical for real-time environmental monitoring.<n>These limitations are especially pronounced in emerging aerosol MALDI-MS systems, where autonomous sampling generates noisy spectra for unknown aerosol analytes.<n>We propose the Mass Spectral Dictionary-Guided Transformer (MS-DGFormer), a data-driven framework that redefines spectral
- Score: 2.743898388459522
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Matrix Assisted Laser Desorption/Ionization Mass Spectrometry (MALDI-MS) is a cornerstone in biomolecular analysis, offering precise identification of pathogens through unique mass spectral signatures. Yet, its reliance on labor-intensive sample preparation and multi-shot spectral averaging restricts its use to laboratory settings, rendering it impractical for real-time environmental monitoring. These limitations are especially pronounced in emerging aerosol MALDI-MS systems, where autonomous sampling generates noisy spectra for unknown aerosol analytes, requiring single-shot detection for effective analysis. Addressing these challenges, we propose the Mass Spectral Dictionary-Guided Transformer (MS-DGFormer): a data-driven framework that redefines spectral analysis by directly processing raw, minimally prepared mass spectral data. MS-DGFormer leverages a transformer architecture, designed to capture the long-range dependencies inherent in these time-series spectra. To enhance feature extraction, we introduce a novel dictionary encoder that integrates denoised spectral information derived from Singular Value Decomposition (SVD), enabling the model to discern critical biomolecular patterns from single-shot spectra with robust performance. This innovation provides a system to achieve superior pathogen identification from aerosol samples, facilitating autonomous, real-time analysis in field conditions. By eliminating the need for extensive preprocessing, our method unlocks the potential for portable, deployable MALDI-MS platforms, revolutionizing environmental pathogen detection and rapid response to biological threats.
Related papers
- From Static Spectra to Operando Infrared Dynamics: Physics Informed Flow Modeling and a Benchmark [67.29937933325849]
Operando IR Prediction aims to forecast the time-resolved evolution of spectral fingerprints'' from a single static spectrum.<n>OpIRSpec-7K comprises 7,118 high-quality samples across 10 distinct battery systems.<n>ABCC significantly outperforms state-of-the-art static, sequential, and generative baselines.
arXiv Detail & Related papers (2026-02-20T18:58:43Z) - SIGMA: Scalable Spectral Insights for LLM Collapse [51.863164847253366]
We introduce SIGMA (Spectral Inequalities for Gram Matrix Analysis), a unified framework for model collapse.<n>By utilizing benchmarks that deriving and deterministic bounds on the matrix's spectrum, SIGMA provides a mathematically grounded metric to track the contraction of the representation space.<n>We demonstrate that SIGMA effectively captures the transition towards states, offering both theoretical insights into the mechanics of collapse.
arXiv Detail & Related papers (2026-01-06T19:47:11Z) - Breaking the Modality Barrier: Generative Modeling for Accurate Molecule Retrieval from Mass Spectra [60.08608779794957]
We propose GLMR, a Generative Language Model-based Retrieval framework.<n>In the pre-retrieval stage, a contrastive learning-based model identifies top candidate molecules as contextual priors for the input mass spectrum.<n>In the generative retrieval stage, these candidate molecules are integrated with the input mass spectrum to guide a generative model in producing refined molecular structures.
arXiv Detail & Related papers (2025-11-09T07:25:53Z) - MS-BART: Unified Modeling of Mass Spectra and Molecules for Structure Elucidation [20.973121120131875]
Large-scale pretraining has proven effective in addressing data scarcity in other domains.<n>We propose MS-BART, a unified modeling framework that maps mass spectra and molecular structures into a shared token vocabulary.<n>Extensive evaluations demonstrate that MS-BART achieves SOTA performance across 5/12 key metrics on MassSpecGym and NPLIB1.
arXiv Detail & Related papers (2025-10-23T14:45:28Z) - SpectrumFM: Redefining Spectrum Cognition via Foundation Modeling [65.65474629224558]
We propose a spectrum foundation model, termed SpectrumFM, which provides a new paradigm for spectrum cognition.<n>An innovative spectrum encoder that exploits the convolutional neural networks is proposed to effectively capture both fine-grained local signal structures and high-level global dependencies in the spectrum data.<n>Two novel self-supervised learning tasks, namely masked reconstruction and next-slot signal prediction, are developed for pre-training SpectrumFM, enabling the model to learn rich and transferable representations.
arXiv Detail & Related papers (2025-08-02T14:40:50Z) - LUMIR: an LLM-Driven Unified Agent Framework for Multi-task Infrared Spectroscopy Reasoning [12.138903544219724]
This study introduces LUMIR, a framework designed to achieve accurate infrared spectral analysis under low data conditions.<n> LUMIR integrates a structured literature knowledge base, automated preprocessing, feature extraction, and predictive modeling into a unified pipeline.<n>It was validated on diverse datasets, including the publicly available Milk near-infrared dataset, Chinese medicinal herbs, Citri Reticulatae Pericarpium(CRP) with different storage durations, an industrial wastewater COD dataset, Tecator and Corn.
arXiv Detail & Related papers (2025-07-29T03:20:51Z) - CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis [69.02751635551724]
Spectral imaging offers promising applications across diverse domains, including medicine and urban scene understanding.<n> variability in channel dimensionality and captured wavelengths among spectral cameras impede the development of AI-driven methodologies.<n>We introduce CARL, a model for Camera-Agnostic Representation Learning across RGB, multispectral, and hyperspectral imaging modalities.
arXiv Detail & Related papers (2025-04-27T13:06:40Z) - DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
We present DiffMS, a formula-restricted encoder-decoder generative network that achieves state-of-the-art performance on this task.<n>To develop a robust decoder that bridges latent embeddings and molecular structures, we pretrain the diffusion decoder with fingerprint-structure pairs.<n>Experiments on established benchmarks show that DiffMS outperforms existing models on de novo molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z) - DiffRaman: A Conditional Latent Denoising Diffusion Probabilistic Model for Bacterial Raman Spectroscopy Identification Under Limited Data Conditions [11.586869210490628]
This paper proposes a data generation method utilizing deep generative models to expand the data volume and enhance the recognition accuracy of bacterial Raman spectra.<n> Experimental results demonstrate that synthetic bacterial Raman spectra generated by DiffRaman can effectively emulate real experimental spectra.
arXiv Detail & Related papers (2024-12-11T06:36:55Z) - Machine-learning-enhanced time-of-flight mass spectrometry analysis [10.16825220733013]
We introduce an approach that leverages modern machine learning technique to identify peak patterns in time-of-flight mass spectra within microseconds.
Our approach is cross-validated on mass spectra generated from different time-of-flight mass spectrometry(ToF-MS) techniques, offering the ToF-MS community an open-source, intelligent mass spectra analysis.
arXiv Detail & Related papers (2020-10-02T14:35:47Z) - Fast and automated biomarker detection in breath samples with machine
learning [1.2026897155625271]
Volatile organic compounds (VOCs) in human breath can reveal a large spectrum of health conditions.
Gas chromatography-mass spectrometry (GC-MS) is used to measure VOCs, but its application is limited by expert-driven data analysis.
We propose a system to perform GC-MS data analysis that exploits deep learning pattern recognition ability to learn and automatically detect VOCs.
arXiv Detail & Related papers (2020-05-24T11:44:28Z) - SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier
Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples.
We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.