Rapid Machine Learning-Driven Detection of Pesticides and Dyes Using Raman Spectroscopy
- URL: http://arxiv.org/abs/2511.12167v1
- Date: Sat, 15 Nov 2025 11:35:55 GMT
- Title: Rapid Machine Learning-Driven Detection of Pesticides and Dyes Using Raman Spectroscopy
- Authors: Quach Thi Thai Binh, Thuan Phuoc, Xuan Hai, Thang Bach Phan, Vu Thi Hanh Thu, Nguyen Tuan Hung,
- Abstract summary: pesticides and synthetic dyes pose critical threats to food safety, human health, and environmental sustainability.<n>Raman spectroscopy offers molecularly specific fingerprints but suffers from spectral noise, fluorescence background, and band overlap.<n>Here, we propose a deep learning framework based on ResNet-18 feature extraction to detect pesticides and dyes from Raman spectroscopy, called MLRaman.
- Score: 0.5002873541686897
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The extensive use of pesticides and synthetic dyes poses critical threats to food safety, human health, and environmental sustainability, necessitating rapid and reliable detection methods. Raman spectroscopy offers molecularly specific fingerprints but suffers from spectral noise, fluorescence background, and band overlap, limiting its real-world applicability. Here, we propose a deep learning framework based on ResNet-18 feature extraction, combined with advanced classifiers, including XGBoost, SVM, and their hybrid integration, to detect pesticides and dyes from Raman spectroscopy, called MLRaman. The MLRaman with the CNN-XGBoost model achieved a predictive accuracy of 97.4% and a perfect AUC of 1.0, while it with the CNN-SVM model provided competitive results with robust class-wise discrimination. Dimensionality reduction analyses (PCA, t-SNE, UMAP) confirmed the separability of Raman embeddings across 10 analytes, including 7 pesticides and 3 dyes. Finally, we developed a user-friendly Streamlit application for real-time prediction, which successfully identified unseen Raman spectra from our independent experiments and also literature sources, underscoring strong generalization capacity. This study establishes a scalable, practical MLRaman model for multi-residue contaminant monitoring, with significant potential for deployment in food safety and environmental surveillance.
Related papers
- From Static Spectra to Operando Infrared Dynamics: Physics Informed Flow Modeling and a Benchmark [67.29937933325849]
Operando IR Prediction aims to forecast the time-resolved evolution of spectral fingerprints'' from a single static spectrum.<n>OpIRSpec-7K comprises 7,118 high-quality samples across 10 distinct battery systems.<n>ABCC significantly outperforms state-of-the-art static, sequential, and generative baselines.
arXiv Detail & Related papers (2026-02-20T18:58:43Z) - The Eminence in Shadow: Exploiting Feature Boundary Ambiguity for Robust Backdoor Attacks [51.468144272905135]
Deep neural networks (DNNs) underpin critical applications yet remain vulnerable to backdoor attacks.<n>We provide a theoretical analysis targeting backdoor attacks, focusing on how sparse decision boundaries enable disproportionate model manipulation.<n>We propose Eminence, an explainable and robust black-box backdoor framework with provable theoretical guarantees and inherent stealth properties.
arXiv Detail & Related papers (2025-12-11T08:09:07Z) - Unmasking Airborne Threats: Guided-Transformers for Portable Aerosol Mass Spectrometry [2.743898388459522]
Matrix Assisted Laser Desorption/Ionization Mass Spectrometry (MALDI-MS) is a cornerstone in biomolecular analysis, offering precise identification of pathogens through unique mass spectral signatures.<n>Yet, its reliance on labor-intensive sample preparation and multi-shot spectral averaging restricts its use to laboratory settings, rendering it impractical for real-time environmental monitoring.<n>These limitations are especially pronounced in emerging aerosol MALDI-MS systems, where autonomous sampling generates noisy spectra for unknown aerosol analytes.<n>We propose the Mass Spectral Dictionary-Guided Transformer (MS-DGFormer), a data-driven framework that redefines spectral
arXiv Detail & Related papers (2025-11-21T17:45:00Z) - Breaking the Modality Barrier: Generative Modeling for Accurate Molecule Retrieval from Mass Spectra [60.08608779794957]
We propose GLMR, a Generative Language Model-based Retrieval framework.<n>In the pre-retrieval stage, a contrastive learning-based model identifies top candidate molecules as contextual priors for the input mass spectrum.<n>In the generative retrieval stage, these candidate molecules are integrated with the input mass spectrum to guide a generative model in producing refined molecular structures.
arXiv Detail & Related papers (2025-11-09T07:25:53Z) - Enhancing ECG Classification Robustness with Lightweight Unsupervised Anomaly Detection Filters [39.9470953186283]
Continuous electrocardiogram (ECG) monitoring via wearables offers significant potential for early cardiovascular disease (CVD) detection.<n> deploying deep learning models for automated analysis in resource-constrained environments faces reliability challenges due to Out-of-Distribution data.<n>This paper explores Unsupervised Anomaly Detection (UAD) as an independent, upstream filtering mechanism to improve robustness.
arXiv Detail & Related papers (2025-10-30T13:54:37Z) - Rethinking Toxicity Evaluation in Large Language Models: A Multi-Label Perspective [104.09817371557476]
Large language models (LLMs) have achieved impressive results across a range of natural language processing tasks.<n>Their potential to generate harmful content has raised serious safety concerns.<n>We introduce three novel multi-label benchmarks for toxicity detection.
arXiv Detail & Related papers (2025-10-16T06:50:33Z) - Deep-Learning Investigation of Vibrational Raman Spectra for Plant-Stress Analysis [0.9287179270753105]
Biomolecules within plants serve as key stress indicators, offering vital markers for continuous health monitoring and early disease detection.<n>Traditional Raman analysis relies on customized data-processing that require fluorescence background removal and prior identification of Raman peaks of interest.<n>Here, we introduce DIVA (Deep-learning-based Investigation of Vibrational Raman spectra for plant-stress Analysis), a fully automated workflow based on a variational autoencoder.
arXiv Detail & Related papers (2025-07-21T16:27:34Z) - Drug classification based on X-ray spectroscopy combined with machine learning [11.985793625437546]
X-ray absorption spectroscopy offers advantages such as ease of operation, penetrative observation, and strong substance differentiation capabilities.<n>In this study, we constructed a classification model using Convolutional Neural Networks (CNN), Support Vector Machines (SVM), and Particle Swarm Optimization (PSO)<n>The experimental results demonstrate that this model achieved higher classification accuracy compared to two other common methods, with a prediction accuracy of 99.14%.
arXiv Detail & Related papers (2025-05-04T04:49:55Z) - Interpretable Graph Kolmogorov-Arnold Networks for Multi-Cancer Classification and Biomarker Identification using Multi-Omics Data [36.92842246372894]
Multi-Omics Graph Kolmogorov-Arnold Network (MOGKAN) is a deep learning framework that utilizes messenger-RNA, micro-RNA sequences, and DNA methylation samples.<n>By integrating multi-omics data with graph-based deep learning, our proposed approach demonstrates robust predictive performance and interpretability.
arXiv Detail & Related papers (2025-03-29T02:14:05Z) - Explainable Deep Learning Framework for SERS Bio-quantification [12.855316833585908]
This study aims to address present challenges of surface-enhanced Raman spectroscopy (SERS) through a novel SERS bio-quantification framework.
Serotonin quantification in urine media was assessed as a model task with 682 SERS spectra measured in a micromolar range using cucurbit[8]uril chemical spacers.
A novel context representative interpretable model explanations (CRIME) method was developed to suit the current needs of SERS mixture analysis explainability.
arXiv Detail & Related papers (2024-11-12T11:26:56Z) - Uncovering the Mechanism of Hepatotoxiciy of PFAS Targeting L-FABP Using GCN and Computational Modeling [1.6249398255272316]
Per- and polyfluoroalkyl substances (PFAS) are persistent environmental pollutants with known toxicity and bioaccumulation issues.
This study advances the predictive modeling of PFAS toxicity by combining semi-supervised graph convolutional networks (GCNs) with molecular descriptors and fingerprints.
arXiv Detail & Related papers (2024-09-16T15:13:39Z) - Determination of Trace Organic Contaminant Concentration via Machine
Classification of Surface-Enhanced Raman Spectra [0.7029155133139362]
We show an approach for predicting the concentration of sample pollutants from messy, unprocessed Raman data using machine learning.
Using standard machine learning models, the concentration of sample pollutants are predicted with more than 80 percent cross-validation accuracy from raw Raman data.
arXiv Detail & Related papers (2024-01-31T21:49:40Z) - ChemVise: Maximizing Out-of-Distribution Chemical Detection with the
Novel Application of Zero-Shot Learning [60.02503434201552]
This research proposes learning approximations of complex exposures from training sets of simple ones.
We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
arXiv Detail & Related papers (2023-02-09T20:19:57Z) - Evaluation of the potential of Near Infrared Hyperspectral Imaging for
monitoring the invasive brown marmorated stink bug [53.682955739083056]
The brown marmorated stink bug (BMSB), Halyomorpha halys, is an invasive insect pest of global importance that damages several crops.
The present study consists in a preliminary evaluation at the laboratory level of Near Infrared Hyperspectral Imaging (NIR-HSI) as a possible technology to detect BMSB specimens.
arXiv Detail & Related papers (2023-01-19T11:37:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.