A universal synthetic dataset for machine learning on spectroscopic data
- URL: http://arxiv.org/abs/2206.06031v2
- Date: Tue, 14 Jun 2022 09:25:53 GMT
- Title: A universal synthetic dataset for machine learning on spectroscopic data
- Authors: Jan Schuetzke, Nathan J. Szymanski, Markus Reischl
- Abstract summary: This dataset contains artificial spectra designed to represent experimental measurements from techniques including X-ray diffraction, nuclear magnetic resonance, and Raman spectroscopy.
The dataset generation process features customizable parameters, such as scan length and peak count, which can be adjusted to fit the problem at hand.
As an initial benchmark, we simulated a dataset containing 35,000 spectra based on 500 unique classes.
- Score: 0.5801044612920815
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To assist in the development of machine learning methods for automated
classification of spectroscopic data, we have generated a universal synthetic
dataset that can be used for model validation. This dataset contains artificial
spectra designed to represent experimental measurements from techniques
including X-ray diffraction, nuclear magnetic resonance, and Raman
spectroscopy. The dataset generation process features customizable parameters,
such as scan length and peak count, which can be adjusted to fit the problem at
hand. As an initial benchmark, we simulated a dataset containing 35,000 spectra
based on 500 unique classes. To automate the classification of this data, eight
different machine learning architectures were evaluated. From the results, we
shed light on which factors are most critical to achieve optimal performance
for the classification task. The scripts used to generate synthetic spectra, as
well as our benchmark dataset and evaluation routines, are made publicly
available to aid in the development of improved machine learning models for
spectroscopic analysis.
Related papers
- Artificial Intelligence in Spectroscopy: Advancing Chemistry from Prediction to Generation and Beyond [38.32974480709081]
The rapid advent of machine learning (ML) and artificial intelligence (AI) has catalyzed major transformations in chemistry.
The application of these methods to spectroscopic and spectrometric data, referred to as Spectroscopy Machine Learning (SpectraML), remains relatively underexplored.
We provide a unified review of SpectraML, systematically examining state-of-the-art approaches for both forward tasks and inverse tasks.
arXiv Detail & Related papers (2025-02-14T04:07:25Z) - Stellar parameter prediction and spectral simulation using machine learning [0.0]
We applied machine learning to the entire data history of ESO's High Accuracy Radial Velocity Planet Searcher (HARPS) instrument.
We trained standard and variational autoencoders on HARPS data to predict spectral parameters and generate spectra.
Our models excel at predicting spectral parameters and compressing real spectra, and they achieved a mean prediction error of approximately 50 K for effective temperatures.
arXiv Detail & Related papers (2024-12-12T07:09:42Z) - Enhancing radioisotope identification in gamma spectra with transfer learning [0.0]
We pretrain a model using physically derived synthetic data and leverage transfer learning techniques to fine-tune the model for a specific target domain.
Results of this analysis indicate that fine-tuned models significantly outperform those trained exclusively on synthetic data or solely on target-domain data.
This research serves as proof of concept for applying transfer learning techniques to application scenarios where access to experimental data is limited.
arXiv Detail & Related papers (2024-12-10T00:21:00Z) - Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems [0.0]
Synthetic datasets are important for evaluating and testing machine learning models.
We develop a novel framework for generating synthetic datasets that are diverse and statistically coherent.
The framework is available as a free open Python package to facilitate research with minimal friction.
arXiv Detail & Related papers (2024-11-27T09:53:14Z) - Learning from Synthetic Data for Visual Grounding [55.21937116752679]
We show that SynGround can improve the localization capabilities of off-the-shelf vision-and-language models.
Data generated with SynGround improves the pointing game accuracy of a pretrained ALBEF and BLIP models by 4.81% and 17.11% absolute percentage points, respectively.
arXiv Detail & Related papers (2024-03-20T17:59:43Z) - TarGEN: Targeted Data Generation with Large Language Models [51.87504111286201]
TarGEN is a multi-step prompting strategy for generating high-quality synthetic datasets.
We augment TarGEN with a method known as self-correction empowering LLMs to rectify inaccurately labeled instances.
A comprehensive analysis of the synthetic dataset compared to the original dataset reveals similar or higher levels of dataset complexity and diversity.
arXiv Detail & Related papers (2023-10-27T03:32:17Z) - Optimizations of Autoencoders for Analysis and Classification of
Microscopic In Situ Hybridization Images [68.8204255655161]
We propose a deep-learning framework to detect and classify areas of microscopic images with similar levels of gene expression.
The data we analyze requires an unsupervised learning model for which we employ a type of Artificial Neural Network - Deep Learning Autoencoders.
arXiv Detail & Related papers (2023-04-19T13:45:28Z) - Trustworthiness of Laser-Induced Breakdown Spectroscopy Predictions via
Simulation-based Synthetic Data Augmentation and Multitask Learning [4.633997895806144]
We consider quantitative analyses of spectral data using laser-induced breakdown spectroscopy.
We address the small size of training data available, and the validation of the predictions during inference on unknown data.
arXiv Detail & Related papers (2022-10-07T18:00:09Z) - Low-complexity deep learning frameworks for acoustic scene
classification [64.22762153453175]
We present low-complexity deep learning frameworks for acoustic scene classification (ASC)
The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities.
Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%.
arXiv Detail & Related papers (2022-06-13T11:41:39Z) - Unsupervised Machine Learning for Exploratory Data Analysis of Exoplanet
Transmission Spectra [68.8204255655161]
We focus on unsupervised techniques for analyzing spectral data from transiting exoplanets.
We show that there is a high degree of correlation in the spectral data, which calls for appropriate low-dimensional representations.
We uncover interesting structures in the principal component basis, namely, well-defined branches corresponding to different chemical regimes.
arXiv Detail & Related papers (2022-01-07T22:26:33Z) - A parameter refinement method for Ptychography based on Deep Learning
concepts [55.41644538483948]
coarse parametrisation in propagation distance, position errors and partial coherence frequently menaces the experiment viability.
A modern Deep Learning framework is used to correct autonomously the setup incoherences, thus improving the quality of a ptychography reconstruction.
We tested our system on both synthetic datasets and also on real data acquired at the TwinMic beamline of the Elettra synchrotron facility.
arXiv Detail & Related papers (2021-05-18T10:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.