Related papers: Conditional Generative Framework with Peak-Aware Attention for Robust Chemical Detection under Interferences

Conditional Generative Framework with Peak-Aware Attention for Robust Chemical Detection under Interferences

URL: http://arxiv.org/abs/2601.21246v1
Date: Thu, 29 Jan 2026 04:10:37 GMT
Title: Conditional Generative Framework with Peak-Aware Attention for Robust Chemical Detection under Interferences
Authors: Namkyung Yoon, Sanghong Kim, Hwangnam Kim,
Abstract summary: In this paper, we propose an artificial intelligence discrimination framework based on a peak-aware conditional generative model.<n>The framework is learned with a novel peak-aware mechanism that highlights the characteristic peaks of GC-MS data.<n>In addition, chemical and solvent information is encoded in a latent vector embedded with it, allowing a conditional generative adversarial neural network to generate a synthetic GC-MS signal.
Score: 3.976291254896486
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gas chromatography-mass spectrometry (GC-MS) is a widely used analytical method for chemical substance detection, but measurement reliability tends to deteriorate in the presence of interfering substances. In particular, interfering substances cause nonspecific peaks, residence time shifts, and increased background noise, resulting in reduced sensitivity and false alarms. To overcome these challenges, in this paper, we propose an artificial intelligence discrimination framework based on a peak-aware conditional generative model to improve the reliability of GC-MS measurements under interference conditions. The framework is learned with a novel peak-aware mechanism that highlights the characteristic peaks of GC-MS data, allowing it to generate important spectral features more faithfully. In addition, chemical and solvent information is encoded in a latent vector embedded with it, allowing a conditional generative adversarial neural network (CGAN) to generate a synthetic GC-MS signal consistent with the experimental conditions. This generates an experimental dataset that assumes indirect substance situations in chemical substance data, where acquisition is limited without conducting real experiments. These data are used for the learning of AI-based GC-MS discrimination models to help in accurate chemical substance discrimination. We conduct various quantitative and qualitative evaluations of the generated simulated data to verify the validity of the proposed framework. We also verify how the generative model improves the performance of the AI discrimination framework. Representatively, the proposed method is shown to consistently achieve cosine similarity and Pearson correlation coefficient values above 0.9 while preserving peak number diversity and reducing false alarms in the discrimination model.

Related papers

Contextual Invertible World Models: A Neuro-Symbolic Agentic Framework for Colorectal Cancer Drug Response [4.796382757669091]
Precision oncology is currently limited by the small-N, large-P paradox.<n>We present a Neuro-Symbolic Agentic Framework that bridges this gap.<n>Our framework provides a transparent, biologically grounded path towards explainable AI in cancer research.
arXiv Detail & Related papers (2026-03-01T16:15:58Z)
Data-driven Synthesis of Magnetic Resonance Spectroscopy Data using a Variational Autoencoder [1.7789378551794652]
We propose a data-driven framework for synthesizing in-vivo MRS data using a variational autoencoder (VAE) trained exclusively on measured single-voxel spectroscopy data.<n>The VAE learns a low-dimensional latent representation of complex-valued spectra and enables generation of new samples through latent-space sampling and synthesis.<n>The results demonstrate that the VAE accurately reconstructs dominant spectral patterns and generates synthetic spectra that occupy the same feature space as in-vivo data.
arXiv Detail & Related papers (2026-02-28T16:52:16Z)
SIGMA: Scalable Spectral Insights for LLM Collapse [51.863164847253366]
We introduce SIGMA (Spectral Inequalities for Gram Matrix Analysis), a unified framework for model collapse.<n>By utilizing benchmarks that deriving and deterministic bounds on the matrix's spectrum, SIGMA provides a mathematically grounded metric to track the contraction of the representation space.<n>We demonstrate that SIGMA effectively captures the transition towards states, offering both theoretical insights into the mechanics of collapse.
arXiv Detail & Related papers (2026-01-06T19:47:11Z)
Comparative Analysis of Formula and Structure Prediction from Tandem Mass Spectra [3.2243643829769586]
Liquid chromatography mass spectrometry (LC-MS)-based metabolomics and exposomics aim to measure detectable small molecules in biological samples.<n>Findings have established realistic performance baselines, identified critical bottlenecks, and provided guidance to further improve compound predictions based on MS.
arXiv Detail & Related papers (2026-01-02T16:20:13Z)
Pretraining Transformer-Based Models on Diffusion-Generated Synthetic Graphs for Alzheimer's Disease Prediction [0.0]
We propose a Transformer-based diagnostic framework that combines synthetic data generation with graph representation learning and transfer learning.<n>A class-conditional denoising diffusion probabilistic model (DDPM) is trained on the real-world NACC dataset to generate a large synthetic cohort.<n> Modality-specific Graph Transformer encoders are first pretrained on this synthetic data to learn robust, class-discriminative representations.
arXiv Detail & Related papers (2025-11-24T19:34:53Z)
Towards Overcoming Data Scarcity in Nuclear Energy: A Study on Critical Heat Flux with Physics-consistent Conditional Diffusion Model [44.6164235303852]
Deep generative models, such as the diffusion model (DM), can generate high-fidelity synthetic samples that statistically resemble the training data.<n>This paper investigates the effectiveness of DM for overcoming data scarcity in nuclear energy applications.
arXiv Detail & Related papers (2025-11-20T10:19:03Z)
AI-driven Generation of MALDI-TOF MS for Microbial Characterization [1.3155923068686746]
This study investigates the use of deep generative models to synthesize realistic MALDI-TOF MS spectra.<n>We adapt and evaluate three generative models, Variational Autoencoders (MALDIVAEs), Generative Adversarial Networks (MALDIGANs), and Denoising Probabilistic Model (MALDIffusion)<n>Experiments show that synthetic data generated by MALDIVAE, MALDIGAN, and MALDIffusion are statistically and diagnostically comparable to real measurements.
arXiv Detail & Related papers (2025-11-18T10:01:21Z)
Rapid Machine Learning-Driven Detection of Pesticides and Dyes Using Raman Spectroscopy [0.5002873541686897]
pesticides and synthetic dyes pose critical threats to food safety, human health, and environmental sustainability.<n>Raman spectroscopy offers molecularly specific fingerprints but suffers from spectral noise, fluorescence background, and band overlap.<n>Here, we propose a deep learning framework based on ResNet-18 feature extraction to detect pesticides and dyes from Raman spectroscopy, called MLRaman.
arXiv Detail & Related papers (2025-11-15T11:35:55Z)
Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges [68.98973318553983]
We propose a framework based on Dual Diffusion Implicit Bridges (DDIB) to learn the mapping between different data distributions.<n>We integrate gene regulatory network (GRN) information to propagate perturbation signals in a biologically meaningful way.<n>We also incorporate a masking mechanism to predict silent genes, improving the quality of generated profiles.
arXiv Detail & Related papers (2025-06-26T09:05:38Z)
Chemical knowledge-informed framework for privacy-aware retrosynthesis learning [72.39098405805318]
Current machine learning-based retrosynthesis gathers reaction data from multiple sources into one single edge to train prediction models.<n>This paradigm poses considerable privacy risks as it necessitates broad data availability across organizational boundaries.<n>In the present study, we introduce the chemical knowledge-informed framework (CKIF), a privacy-preserving approach for learning retrosynthesis models.
arXiv Detail & Related papers (2025-02-26T13:13:24Z)
Synthetic Time Series Data Generation for Healthcare Applications: A PCG Case Study [43.28613210217385]
We employ and compare three state-of-the-art generative models to generate PCG data.<n>Our results demonstrate that the generated PCG data closely resembles the original datasets.<n>In our future work, we plan to incorporate this method into a data augmentation pipeline to synthesize abnormal PCG signals with heart murmurs.
arXiv Detail & Related papers (2024-12-17T18:07:40Z)
A Gaussian Process Model for Ordinal Data with Applications to Chemoinformatics [0.0]
We present conditional Gaussian process models to predict ordinal outcomes from chemical experiments. A novel aspect of our model is that the kernel contains a scaling parameter, that controls the strength of the correlation between elements of the chemical space. We present a genetic algorithm for the facilitation of chemical discovery and identification of important features to the compound's efficacy.
arXiv Detail & Related papers (2024-05-16T11:18:32Z)
Brain Imaging-to-Graph Generation using Adversarial Hierarchical Diffusion Models for MCI Causality Analysis [44.45598796591008]
Brain imaging-to-graph generation (BIGG) framework is proposed to map functional magnetic resonance imaging (fMRI) into effective connectivity for mild cognitive impairment analysis. The hierarchical transformers in the generator are designed to estimate the noise at multiple scales. Evaluations of the ADNI dataset demonstrate the feasibility and efficacy of the proposed model.
arXiv Detail & Related papers (2023-05-18T06:54:56Z)
Statistical control for spatio-temporal MEG/EEG source imaging with desparsified multi-task Lasso [102.84915019938413]
Non-invasive techniques like magnetoencephalography (MEG) or electroencephalography (EEG) offer promise of non-invasive techniques. The problem of source localization, or source imaging, poses however a high-dimensional statistical inference challenge. We propose an ensemble of desparsified multi-task Lasso (ecd-MTLasso) to deal with this problem.
arXiv Detail & Related papers (2020-09-29T21:17:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.