A simulation-based training framework for machine-learning applications in ARPES
- URL: http://arxiv.org/abs/2508.15983v1
- Date: Thu, 21 Aug 2025 21:59:09 GMT
- Title: A simulation-based training framework for machine-learning applications in ARPES
- Authors: MengXing Na, Chris Zhou, Sydney K. Y. Dufresne, Matteo Michiardi, Andrea Damascelli,
- Abstract summary: We introduce an open-source synthetic ARPES spectra simulator - aurelia - for generating the large datasets necessary to train machine learning models.<n>We benchmark the simulation-trained model against actual experimental data and find that it can assess the spectra quality more accurately than human analysis.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, angle-resolved photoemission spectroscopy (ARPES) has advanced significantly in its ability to probe more observables and simultaneously generate multi-dimensional datasets. These advances present new challenges in data acquisition, processing, and analysis. Machine learning (ML) models can drastically reduce the workload of experimentalists; however, the lack of training data for ML -- and in particular deep learning -- is a significant obstacle. In this work, we introduce an open-source synthetic ARPES spectra simulator - aurelia - for the purpose of generating the large datasets necessary to train ML models. As a demonstration, we train a convolutional neural network to evaluate ARPES spectra quality -- a critical task performed during the initial sample alignment phase of the experiment. We benchmark the simulation-trained model against actual experimental data and find that it can assess the spectra quality more accurately than human analysis, and swiftly identify the optimal measurement region with high precision. Thus, we establish that simulated ARPES spectra can be an effective proxy for experimental spectra in training ML models.
Related papers
- Stellar parameter prediction and spectral simulation using machine learning [0.0]
We applied machine learning to the entire data history of ESO's High Accuracy Radial Velocity Planet Searcher (HARPS) instrument.<n>We trained standard and variational autoencoders on HARPS data to predict spectral parameters and generate spectra.<n>Our models excel at predicting spectral parameters and compressing real spectra, and they achieved a mean prediction error of approximately 50 K for effective temperatures.
arXiv Detail & Related papers (2024-12-12T07:09:42Z) - Enhancing radioisotope identification in gamma spectra via supervised domain adaptation [0.0]
This study explores how supervised domain adaptation techniques can improve radioisotope identification models.<n>We begin by pretraining a model for radioisotope identification using data from a synthetic source domain, and then fine-tune it for a new target domain.<n>Our analysis indicates that fine-tuned models significantly outperform those trained exclusively on source-domain data or solely on target-domain data.
arXiv Detail & Related papers (2024-12-10T00:21:00Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Closing the loop: Autonomous experiments enabled by
machine-learning-based online data analysis in synchrotron beamline
environments [80.49514665620008]
Machine learning can be used to enhance research involving large or rapidly generated datasets.
In this study, we describe the incorporation of ML into a closed-loop workflow for X-ray reflectometry (XRR)
We present solutions that provide an elementary data analysis in real time during the experiment without introducing the additional software dependencies in the beamline control software environment.
arXiv Detail & Related papers (2023-06-20T21:21:19Z) - Machine learning enabled experimental design and parameter estimation
for ultrafast spin dynamics [54.172707311728885]
We introduce a methodology that combines machine learning with Bayesian optimal experimental design (BOED)
Our method employs a neural network model for large-scale spin dynamics simulations for precise distribution and utility calculations in BOED.
Our numerical benchmarks demonstrate the superior performance of our method in guiding XPFS experiments, predicting model parameters, and yielding more informative measurements within limited experimental time.
arXiv Detail & Related papers (2023-06-03T06:19:20Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Lamarr: LHCb ultra-fast simulation based on machine learning models deployed within Gauss [0.0]
We discuss Lamarr, a framework to speed-up the simulation production parameterizing both the detector response and the reconstruction algorithms of the LHCb experiment.
Deep Generative Models powered by several algorithms and strategies are employed to effectively parameterize the high-level response of the single components of the LHCb detector.
arXiv Detail & Related papers (2023-03-20T20:18:04Z) - Continual learning autoencoder training for a particle-in-cell
simulation via streaming [52.77024349608834]
upcoming exascale era will provide a new generation of physics simulations with high resolution.
These simulations will have a high resolution, which will impact the training of machine learning models since storing a high amount of simulation data on disk is nearly impossible.
This work presents an approach that trains a neural network concurrently to a running simulation without data on a disk.
arXiv Detail & Related papers (2022-11-09T09:55:14Z) - Trustworthiness of Laser-Induced Breakdown Spectroscopy Predictions via
Simulation-based Synthetic Data Augmentation and Multitask Learning [4.633997895806144]
We consider quantitative analyses of spectral data using laser-induced breakdown spectroscopy.
We address the small size of training data available, and the validation of the predictions during inference on unknown data.
arXiv Detail & Related papers (2022-10-07T18:00:09Z) - Cognitive simulation models for inertial confinement fusion: Combining
simulation and experimental data [0.0]
Researchers rely heavily on computer simulations to explore the design space in search of high-performing implosions.
For more effective design and investigation, simulations require input from past experimental data to better predict future performance.
We describe a cognitive simulation method for combining simulation and experimental data into a common, predictive model.
arXiv Detail & Related papers (2021-03-19T02:00:14Z) - Learning summary features of time series for likelihood free inference [93.08098361687722]
We present a data-driven strategy for automatically learning summary features from time series data.
Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values.
arXiv Detail & Related papers (2020-12-04T19:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.