Trustworthiness of Laser-Induced Breakdown Spectroscopy Predictions via
Simulation-based Synthetic Data Augmentation and Multitask Learning
- URL: http://arxiv.org/abs/2210.03762v1
- Date: Fri, 7 Oct 2022 18:00:09 GMT
- Title: Trustworthiness of Laser-Induced Breakdown Spectroscopy Predictions via
Simulation-based Synthetic Data Augmentation and Multitask Learning
- Authors: Riccardo Finotello, Daniel L'Hermite, Celine Qu\'er\'e, Benjamin
Rouge, Mohamed Tamaazousti, Jean-Baptiste Sirven
- Abstract summary: We consider quantitative analyses of spectral data using laser-induced breakdown spectroscopy.
We address the small size of training data available, and the validation of the predictions during inference on unknown data.
- Score: 4.633997895806144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider quantitative analyses of spectral data using laser-induced
breakdown spectroscopy. We address the small size of training data available,
and the validation of the predictions during inference on unknown data. For the
purpose, we build robust calibration models using deep convolutional multitask
learning architectures to predict the concentration of the analyte, alongside
additional spectral information as auxiliary outputs. These secondary
predictions can be used to validate the trustworthiness of the model by taking
advantage of the mutual dependencies of the parameters of the multitask neural
networks. Due to the experimental lack of training samples, we introduce a
simulation-based data augmentation process to synthesise an arbitrary number of
spectra, statistically representative of the experimental data. Given the
nature of the deep learning model, no dimensionality reduction or data
selection processes are required. The procedure is an end-to-end pipeline
including the process of synthetic data augmentation, the construction of a
suitable robust, homoscedastic, deep learning model, and the validation of its
predictions. In the article, we compare the performance of the multitask model
with traditional univariate and multivariate analyses, to highlight the
separate contributions of each element introduced in the process.
Related papers
- On the Diversity of Synthetic Data and its Impact on Training Large Language Models [34.00031258223175]
Large Language Models (LLMs) have accentuated the need for diverse, high-quality pre-training data.
Synthetic data emerges as a viable solution to the challenges of data scarcity and inaccessibility.
We study the downstream effects of synthetic data diversity during both the pre-training and fine-tuning stages.
arXiv Detail & Related papers (2024-10-19T22:14:07Z) - LMD3: Language Model Data Density Dependence [78.76731603461832]
We develop a methodology for analyzing language model task performance at the individual example level based on training data density estimation.
Experiments with paraphrasing as a controlled intervention on finetuning data demonstrate that increasing the support in the training distribution for specific test queries results in a measurable increase in density.
We conclude that our framework can provide statistical evidence of the dependence of a target model's predictions on subsets of its training data.
arXiv Detail & Related papers (2024-05-10T09:03:27Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Machine learning enabled experimental design and parameter estimation
for ultrafast spin dynamics [54.172707311728885]
We introduce a methodology that combines machine learning with Bayesian optimal experimental design (BOED)
Our method employs a neural network model for large-scale spin dynamics simulations for precise distribution and utility calculations in BOED.
Our numerical benchmarks demonstrate the superior performance of our method in guiding XPFS experiments, predicting model parameters, and yielding more informative measurements within limited experimental time.
arXiv Detail & Related papers (2023-06-03T06:19:20Z) - Perturbation-Assisted Sample Synthesis: A Novel Approach for Uncertainty
Quantification [3.175239447683357]
This paper introduces a novel Perturbation-Assisted Inference (PAI) framework utilizing synthetic data generated by the Perturbation-Assisted Sample Synthesis (PASS) method.
The framework focuses on uncertainty quantification in complex data scenarios, particularly involving unstructured data.
We demonstrate the effectiveness of PAI in advancing uncertainty quantification in complex, data-driven tasks by applying it to diverse areas such as image synthesis, sentiment word analysis, multimodal inference, and the construction of prediction intervals.
arXiv Detail & Related papers (2023-05-30T01:01:36Z) - Sequential Experimental Design for Spectral Measurement: Active Learning
Using a Parametric Model [1.9377646956063705]
In spectral measurements, it is necessary to reduce the measurement time because of sample fragility and high energy costs.
In this study, we demonstrate a sequential experimental design for spectral measurements by active learning using parametric models as predictors.
arXiv Detail & Related papers (2023-05-11T13:21:26Z) - Spectrum-BERT: Pre-training of Deep Bidirectional Transformers for
Spectral Classification of Chinese Liquors [0.0]
We propose a pre-training method of deep bidirectional transformers for spectral classification of Chinese liquors, abbreviated as Spectrum-BERT.
We elaborately design two pre-training tasks, Next Curve Prediction (NCP) and Masked Curve Model (MCM), so that the model can effectively utilize unlabeled samples.
In the comparative experiments, the proposed Spectrum-BERT significantly outperforms the baselines in multiple metrics.
arXiv Detail & Related papers (2022-10-22T13:11:25Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z) - A probabilistic deep learning approach to automate the interpretation of
multi-phase diffraction spectra [4.240899165468488]
We develop an ensemble convolutional neural network trained on simulated diffraction spectra to identify complex multi-phase mixtures.
Our model is benchmarked on simulated and experimentally measured diffraction spectra, showing exceptional performance with accuracies exceeding those given by previously reported methods.
arXiv Detail & Related papers (2021-03-30T20:13:01Z) - Predictive modeling approaches in laser-based material processing [59.04160452043105]
This study aims to automate and forecast the effect of laser processing on material structures.
The focus is centred on the performance of representative statistical and machine learning algorithms.
Results can set the basis for a systematic methodology towards reducing material design, testing and production cost.
arXiv Detail & Related papers (2020-06-13T17:28:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.