A robust synthetic data generation framework for machine learning in
High-Resolution Transmission Electron Microscopy (HRTEM)
- URL: http://arxiv.org/abs/2309.06122v1
- Date: Tue, 12 Sep 2023 10:44:15 GMT
- Title: A robust synthetic data generation framework for machine learning in
High-Resolution Transmission Electron Microscopy (HRTEM)
- Authors: Luis Rangel DaCosta, Katherine Sytwu, Catherine Groschner, Mary Scott
- Abstract summary: Construction Zone is a Python package for rapidly generating complex nanoscale atomic structures.
We develop an end-to-end workflow for creating large simulated databases for training neural networks.
Using our results, we are able to achieve state-of-the-art segmentation performance on experimental HRTEM images of nanoparticles.
- Score: 1.0923877073891446
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning techniques are attractive options for developing
highly-accurate automated analysis tools for nanomaterials characterization,
including high-resolution transmission electron microscopy (HRTEM). However,
successfully implementing such machine learning tools can be difficult due to
the challenges in procuring sufficiently large, high-quality training datasets
from experiments. In this work, we introduce Construction Zone, a Python
package for rapidly generating complex nanoscale atomic structures, and develop
an end-to-end workflow for creating large simulated databases for training
neural networks. Construction Zone enables fast, systematic sampling of
realistic nanomaterial structures, and can be used as a random structure
generator for simulated databases, which is important for generating large,
diverse synthetic datasets. Using HRTEM imaging as an example, we train a
series of neural networks on various subsets of our simulated databases to
segment nanoparticles and holistically study the data curation process to
understand how various aspects of the curated simulated data -- including
simulation fidelity, the distribution of atomic structures, and the
distribution of imaging conditions -- affect model performance across several
experimental benchmarks. Using our results, we are able to achieve
state-of-the-art segmentation performance on experimental HRTEM images of
nanoparticles from several experimental benchmarks and, further, we discuss
robust strategies for consistently achieving high performance with machine
learning in experimental settings using purely synthetic data.
Related papers
- MBDS: A Multi-Body Dynamics Simulation Dataset for Graph Networks Simulators [4.5353840616537555]
Graph Network Simulators (GNS) have emerged as the leading method for modeling physical phenomena.
We have constructed a high-quality physical simulation dataset encompassing 1D, 2D, and 3D scenes.
A key feature of our dataset is the inclusion of precise multi-body dynamics, facilitating a more realistic simulation of the physical world.
arXiv Detail & Related papers (2024-10-04T03:03:06Z) - Advancing fNIRS Neuroimaging through Synthetic Data Generation and Machine Learning Applications [0.0]
This study presents an integrated approach for advancing functional Near-Infrared Spectroscopy (fNIRS) neuroimaging.
By addressing the scarcity of high-quality neuroimaging datasets, this work harnesses Monte Carlo simulations and parametric head models to generate a comprehensive synthetic dataset.
A cloud-based infrastructure is established for scalable data generation and processing, enhancing the accessibility and quality of neuroimaging data.
arXiv Detail & Related papers (2024-05-18T09:50:19Z) - Automated Fusion of Multimodal Electronic Health Records for Better
Medical Predictions [48.0590120095748]
We propose a novel neural architecture search (NAS) framework named AutoFM, which can automatically search for the optimal model architectures for encoding diverse input modalities and fusion strategies.
We conduct thorough experiments on real-world multi-modal EHR data and prediction tasks, and the results demonstrate that our framework achieves significant performance improvement over existing state-of-the-art methods.
arXiv Detail & Related papers (2024-01-20T15:14:14Z) - Enhancing Multi-Objective Optimization through Machine Learning-Supported Multiphysics Simulation [1.6685829157403116]
This paper presents a methodological framework for training, self-optimising, and self-organising surrogate models.
We show that surrogate models can be trained on relatively small amounts of data to approximate the underlying simulations accurately.
arXiv Detail & Related papers (2023-09-22T20:52:50Z) - Machine learning enabled experimental design and parameter estimation
for ultrafast spin dynamics [54.172707311728885]
We introduce a methodology that combines machine learning with Bayesian optimal experimental design (BOED)
Our method employs a neural network model for large-scale spin dynamics simulations for precise distribution and utility calculations in BOED.
Our numerical benchmarks demonstrate the superior performance of our method in guiding XPFS experiments, predicting model parameters, and yielding more informative measurements within limited experimental time.
arXiv Detail & Related papers (2023-06-03T06:19:20Z) - Addressing computational challenges in physical system simulations with
machine learning [0.0]
We present a machine learning-based data generator framework tailored to aid researchers who utilize simulations to examine various physical systems or processes.
Our approach involves a two-step process: first, we train a supervised predictive model using a limited simulated dataset to predict simulation outcomes.
Subsequently, a reinforcement learning agent is trained to generate accurate, simulation-like data by leveraging the supervised model.
arXiv Detail & Related papers (2023-05-16T17:31:50Z) - Deep Bayesian Active Learning for Accelerating Stochastic Simulation [74.58219903138301]
Interactive Neural Process (INP) is a deep active learning framework for simulations and with active learning approaches.
For active learning, we propose a novel acquisition function, Latent Information Gain (LIG), calculated in the latent space of NP based models.
The results demonstrate STNP outperforms the baselines in the learning setting and LIG achieves the state-of-the-art for active learning.
arXiv Detail & Related papers (2021-06-05T01:31:51Z) - Deep Transformer Networks for Time Series Classification: The NPP Safety
Case [59.20947681019466]
An advanced temporal neural network referred to as the Transformer is used within a supervised learning fashion to model the time-dependent NPP simulation data.
The Transformer can learn the characteristics of the sequential data and yield promising performance with approximately 99% classification accuracy on the testing dataset.
arXiv Detail & Related papers (2021-04-09T14:26:25Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Intelligent multiscale simulation based on process-guided composite
database [0.0]
We present an integrated data-driven modeling framework based on process modeling, material homogenization, and machine learning.
We are interested in the injection-molded short fiber reinforced composites, which have been identified as key material systems in automotive, aerospace, and electronics industries.
arXiv Detail & Related papers (2020-03-20T20:39:19Z) - Learning to Simulate Complex Physics with Graph Networks [68.43901833812448]
We present a machine learning framework and model implementation that can learn to simulate a wide variety of challenging physical domains.
Our framework---which we term "Graph Network-based Simulators" (GNS)--represents the state of a physical system with particles, expressed as nodes in a graph, and computes dynamics via learned message-passing.
Our results show that our model can generalize from single-timestep predictions with thousands of particles during training, to different initial conditions, thousands of timesteps, and at least an order of magnitude more particles at test time.
arXiv Detail & Related papers (2020-02-21T16:44:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.