A universal synthetic dataset for machine learning on spectroscopic data
- URL: http://arxiv.org/abs/2206.06031v2
- Date: Tue, 14 Jun 2022 09:25:53 GMT
- Title: A universal synthetic dataset for machine learning on spectroscopic data
- Authors: Jan Schuetzke, Nathan J. Szymanski, Markus Reischl
- Abstract summary: This dataset contains artificial spectra designed to represent experimental measurements from techniques including X-ray diffraction, nuclear magnetic resonance, and Raman spectroscopy.
The dataset generation process features customizable parameters, such as scan length and peak count, which can be adjusted to fit the problem at hand.
As an initial benchmark, we simulated a dataset containing 35,000 spectra based on 500 unique classes.
- Score: 0.5801044612920815
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To assist in the development of machine learning methods for automated
classification of spectroscopic data, we have generated a universal synthetic
dataset that can be used for model validation. This dataset contains artificial
spectra designed to represent experimental measurements from techniques
including X-ray diffraction, nuclear magnetic resonance, and Raman
spectroscopy. The dataset generation process features customizable parameters,
such as scan length and peak count, which can be adjusted to fit the problem at
hand. As an initial benchmark, we simulated a dataset containing 35,000 spectra
based on 500 unique classes. To automate the classification of this data, eight
different machine learning architectures were evaluated. From the results, we
shed light on which factors are most critical to achieve optimal performance
for the classification task. The scripts used to generate synthetic spectra, as
well as our benchmark dataset and evaluation routines, are made publicly
available to aid in the development of improved machine learning models for
spectroscopic analysis.
Related papers
- Artificial Intelligence in Spectroscopy: Advancing Chemistry from Prediction to Generation and Beyond [38.32974480709081]
The rapid advent of machine learning (ML) and artificial intelligence (AI) has catalyzed major transformations in chemistry.
The application of these methods to spectroscopic and spectrometric data, referred to as Spectroscopy Machine Learning (SpectraML), remains relatively underexplored.
We provide a unified review of SpectraML, systematically examining state-of-the-art approaches for both forward tasks and inverse tasks.
arXiv Detail & Related papers (2025-02-14T04:07:25Z) - Stellar parameter prediction and spectral simulation using machine learning [0.0]
We applied machine learning to the entire data history of ESO's High Accuracy Radial Velocity Planet Searcher (HARPS) instrument.
We trained standard and variational autoencoders on HARPS data to predict spectral parameters and generate spectra.
Our models excel at predicting spectral parameters and compressing real spectra, and they achieved a mean prediction error of approximately 50 K for effective temperatures.
arXiv Detail & Related papers (2024-12-12T07:09:42Z) - Enhancing radioisotope identification in gamma spectra with transfer learning [0.0]
We pretrain a model using physically derived synthetic data and leverage transfer learning techniques to fine-tune the model for a specific target domain.
Results of this analysis indicate that fine-tuned models significantly outperform those trained exclusively on synthetic data or solely on target-domain data.
This research serves as proof of concept for applying transfer learning techniques to application scenarios where access to experimental data is limited.
arXiv Detail & Related papers (2024-12-10T00:21:00Z) - Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems [0.0]
Synthetic datasets are important for evaluating and testing machine learning models.
We develop a novel framework for generating synthetic datasets that are diverse and statistically coherent.
The framework is available as a free open Python package to facilitate research with minimal friction.
arXiv Detail & Related papers (2024-11-27T09:53:14Z) - Advancing fNIRS Neuroimaging through Synthetic Data Generation and Machine Learning Applications [0.0]
This study presents an integrated approach for advancing functional Near-Infrared Spectroscopy (fNIRS) neuroimaging.
By addressing the scarcity of high-quality neuroimaging datasets, this work harnesses Monte Carlo simulations and parametric head models to generate a comprehensive synthetic dataset.
A cloud-based infrastructure is established for scalable data generation and processing, enhancing the accessibility and quality of neuroimaging data.
arXiv Detail & Related papers (2024-05-18T09:50:19Z) - Learning from Synthetic Data for Visual Grounding [55.21937116752679]
We show that SynGround can improve the localization capabilities of off-the-shelf vision-and-language models.
Data generated with SynGround improves the pointing game accuracy of a pretrained ALBEF and BLIP models by 4.81% and 17.11% absolute percentage points, respectively.
arXiv Detail & Related papers (2024-03-20T17:59:43Z) - Synthetic Information towards Maximum Posterior Ratio for deep learning
on Imbalanced Data [1.7495515703051119]
We propose a technique for data balancing by generating synthetic data for the minority class.
Our method prioritizes balancing the informative regions by identifying high entropy samples.
Our experimental results on forty-one datasets demonstrate the superior performance of our technique.
arXiv Detail & Related papers (2024-01-05T01:08:26Z) - TarGEN: Targeted Data Generation with Large Language Models [51.87504111286201]
TarGEN is a multi-step prompting strategy for generating high-quality synthetic datasets.
We augment TarGEN with a method known as self-correction empowering LLMs to rectify inaccurately labeled instances.
A comprehensive analysis of the synthetic dataset compared to the original dataset reveals similar or higher levels of dataset complexity and diversity.
arXiv Detail & Related papers (2023-10-27T03:32:17Z) - Optimizations of Autoencoders for Analysis and Classification of
Microscopic In Situ Hybridization Images [68.8204255655161]
We propose a deep-learning framework to detect and classify areas of microscopic images with similar levels of gene expression.
The data we analyze requires an unsupervised learning model for which we employ a type of Artificial Neural Network - Deep Learning Autoencoders.
arXiv Detail & Related papers (2023-04-19T13:45:28Z) - Exploring Supervised Machine Learning for Multi-Phase Identification and
Quantification from Powder X-Ray Diffraction Spectra [1.0660480034605242]
Powder X-ray diffraction analysis is a critical component of materials characterization methodologies.
Deep learning has become a prime focus for predicting crystallographic parameters and features from X-ray spectra.
Here, we are interested in conventional supervised learning algorithms in lieu of deep learning for multi-label crystalline phase identification.
arXiv Detail & Related papers (2022-11-16T00:36:13Z) - Trustworthiness of Laser-Induced Breakdown Spectroscopy Predictions via
Simulation-based Synthetic Data Augmentation and Multitask Learning [4.633997895806144]
We consider quantitative analyses of spectral data using laser-induced breakdown spectroscopy.
We address the small size of training data available, and the validation of the predictions during inference on unknown data.
arXiv Detail & Related papers (2022-10-07T18:00:09Z) - BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot
Detection [63.447493500066045]
This work proposes a data driven learning model for the synthesis of keystroke biometric data.
The proposed method is compared with two statistical approaches based on Universal and User-dependent models.
Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects.
arXiv Detail & Related papers (2022-07-27T09:26:15Z) - Low-complexity deep learning frameworks for acoustic scene
classification [64.22762153453175]
We present low-complexity deep learning frameworks for acoustic scene classification (ASC)
The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities.
Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%.
arXiv Detail & Related papers (2022-06-13T11:41:39Z) - Unsupervised Machine Learning for Exploratory Data Analysis of Exoplanet
Transmission Spectra [68.8204255655161]
We focus on unsupervised techniques for analyzing spectral data from transiting exoplanets.
We show that there is a high degree of correlation in the spectral data, which calls for appropriate low-dimensional representations.
We uncover interesting structures in the principal component basis, namely, well-defined branches corresponding to different chemical regimes.
arXiv Detail & Related papers (2022-01-07T22:26:33Z) - A parameter refinement method for Ptychography based on Deep Learning
concepts [55.41644538483948]
coarse parametrisation in propagation distance, position errors and partial coherence frequently menaces the experiment viability.
A modern Deep Learning framework is used to correct autonomously the setup incoherences, thus improving the quality of a ptychography reconstruction.
We tested our system on both synthetic datasets and also on real data acquired at the TwinMic beamline of the Elettra synchrotron facility.
arXiv Detail & Related papers (2021-05-18T10:15:17Z) - A probabilistic deep learning approach to automate the interpretation of
multi-phase diffraction spectra [4.240899165468488]
We develop an ensemble convolutional neural network trained on simulated diffraction spectra to identify complex multi-phase mixtures.
Our model is benchmarked on simulated and experimentally measured diffraction spectra, showing exceptional performance with accuracies exceeding those given by previously reported methods.
arXiv Detail & Related papers (2021-03-30T20:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.