Related papers: A robust synthetic data generation framework for machine learning in High-Resolution Transmission Electron Microscopy (HRTEM)

A robust synthetic data generation framework for machine learning in High-Resolution Transmission Electron Microscopy (HRTEM)

URL: http://arxiv.org/abs/2309.06122v1
Date: Tue, 12 Sep 2023 10:44:15 GMT
Title: A robust synthetic data generation framework for machine learning in High-Resolution Transmission Electron Microscopy (HRTEM)
Authors: Luis Rangel DaCosta, Katherine Sytwu, Catherine Groschner, Mary Scott
Abstract summary: Construction Zone is a Python package for rapidly generating complex nanoscale atomic structures. We develop an end-to-end workflow for creating large simulated databases for training neural networks. Using our results, we are able to achieve state-of-the-art segmentation performance on experimental HRTEM images of nanoparticles.
Score: 1.0923877073891446
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning techniques are attractive options for developing highly-accurate automated analysis tools for nanomaterials characterization, including high-resolution transmission electron microscopy (HRTEM). However, successfully implementing such machine learning tools can be difficult due to the challenges in procuring sufficiently large, high-quality training datasets from experiments. In this work, we introduce Construction Zone, a Python package for rapidly generating complex nanoscale atomic structures, and develop an end-to-end workflow for creating large simulated databases for training neural networks. Construction Zone enables fast, systematic sampling of realistic nanomaterial structures, and can be used as a random structure generator for simulated databases, which is important for generating large, diverse synthetic datasets. Using HRTEM imaging as an example, we train a series of neural networks on various subsets of our simulated databases to segment nanoparticles and holistically study the data curation process to understand how various aspects of the curated simulated data -- including simulation fidelity, the distribution of atomic structures, and the distribution of imaging conditions -- affect model performance across several experimental benchmarks. Using our results, we are able to achieve state-of-the-art segmentation performance on experimental HRTEM images of nanoparticles from several experimental benchmarks and, further, we discuss robust strategies for consistently achieving high performance with machine learning in experimental settings using purely synthetic data.

Related papers

Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems [0.0]
Synthetic datasets are important for evaluating and testing machine learning models. We develop a novel framework for generating synthetic datasets that are diverse and statistically coherent. The framework is available as a free open Python package to facilitate research with minimal friction.
arXiv Detail & Related papers (2024-11-27T09:53:14Z)
MBDS: A Multi-Body Dynamics Simulation Dataset for Graph Networks Simulators [4.5353840616537555]
Graph Network Simulators (GNS) have emerged as the leading method for modeling physical phenomena. We have constructed a high-quality physical simulation dataset encompassing 1D, 2D, and 3D scenes. A key feature of our dataset is the inclusion of precise multi-body dynamics, facilitating a more realistic simulation of the physical world.
arXiv Detail & Related papers (2024-10-04T03:03:06Z)
Benchmark on Drug Target Interaction Modeling from a Structure Perspective [48.60648369785105]
Drug-target interaction prediction is crucial to drug discovery and design. Recent methods, such as those based on graph neural networks (GNNs) and Transformers, demonstrate exceptional performance across various datasets. We conduct a comprehensive survey and benchmark for drug-target interaction modeling from a structure perspective, via integrating tens of explicit (i.e., GNN-based) and implicit (i.e., Transformer-based) structure learning algorithms.
arXiv Detail & Related papers (2024-07-04T16:56:59Z)
Advancing fNIRS Neuroimaging through Synthetic Data Generation and Machine Learning Applications [0.0]
This study presents an integrated approach for advancing functional Near-Infrared Spectroscopy (fNIRS) neuroimaging. By addressing the scarcity of high-quality neuroimaging datasets, this work harnesses Monte Carlo simulations and parametric head models to generate a comprehensive synthetic dataset. A cloud-based infrastructure is established for scalable data generation and processing, enhancing the accessibility and quality of neuroimaging data.
arXiv Detail & Related papers (2024-05-18T09:50:19Z)
Automated Fusion of Multimodal Electronic Health Records for Better Medical Predictions [48.0590120095748]
We propose a novel neural architecture search (NAS) framework named AutoFM, which can automatically search for the optimal model architectures for encoding diverse input modalities and fusion strategies. We conduct thorough experiments on real-world multi-modal EHR data and prediction tasks, and the results demonstrate that our framework achieves significant performance improvement over existing state-of-the-art methods.
arXiv Detail & Related papers (2024-01-20T15:14:14Z)
Enhancing Multi-Objective Optimization through Machine Learning-Supported Multiphysics Simulation [1.6685829157403116]
This paper presents a methodological framework for training, self-optimising, and self-organising surrogate models. We show that surrogate models can be trained on relatively small amounts of data to approximate the underlying simulations accurately.
arXiv Detail & Related papers (2023-09-22T20:52:50Z)
Machine learning enabled experimental design and parameter estimation for ultrafast spin dynamics [54.172707311728885]
We introduce a methodology that combines machine learning with Bayesian optimal experimental design (BOED) Our method employs a neural network model for large-scale spin dynamics simulations for precise distribution and utility calculations in BOED. Our numerical benchmarks demonstrate the superior performance of our method in guiding XPFS experiments, predicting model parameters, and yielding more informative measurements within limited experimental time.
arXiv Detail & Related papers (2023-06-03T06:19:20Z)
Addressing computational challenges in physical system simulations with machine learning [0.0]
We present a machine learning-based data generator framework tailored to aid researchers who utilize simulations to examine various physical systems or processes. Our approach involves a two-step process: first, we train a supervised predictive model using a limited simulated dataset to predict simulation outcomes. Subsequently, a reinforcement learning agent is trained to generate accurate, simulation-like data by leveraging the supervised model.
arXiv Detail & Related papers (2023-05-16T17:31:50Z)
Deep Bayesian Active Learning for Accelerating Stochastic Simulation [74.58219903138301]
Interactive Neural Process (INP) is a deep active learning framework for simulations and with active learning approaches. For active learning, we propose a novel acquisition function, Latent Information Gain (LIG), calculated in the latent space of NP based models. The results demonstrate STNP outperforms the baselines in the learning setting and LIG achieves the state-of-the-art for active learning.
arXiv Detail & Related papers (2021-06-05T01:31:51Z)
Deep Transformer Networks for Time Series Classification: The NPP Safety Case [59.20947681019466]
An advanced temporal neural network referred to as the Transformer is used within a supervised learning fashion to model the time-dependent NPP simulation data. The Transformer can learn the characteristics of the sequential data and yield promising performance with approximately 99% classification accuracy on the testing dataset.
arXiv Detail & Related papers (2021-04-09T14:26:25Z)
Towards an Automatic Analysis of CHO-K1 Suspension Growth in Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data. Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
Intelligent multiscale simulation based on process-guided composite database [0.0]
We present an integrated data-driven modeling framework based on process modeling, material homogenization, and machine learning. We are interested in the injection-molded short fiber reinforced composites, which have been identified as key material systems in automotive, aerospace, and electronics industries.
arXiv Detail & Related papers (2020-03-20T20:39:19Z)
Learning to Simulate Complex Physics with Graph Networks [68.43901833812448]
We present a machine learning framework and model implementation that can learn to simulate a wide variety of challenging physical domains. Our framework---which we term "Graph Network-based Simulators" (GNS)--represents the state of a physical system with particles, expressed as nodes in a graph, and computes dynamics via learned message-passing. Our results show that our model can generalize from single-timestep predictions with thousands of particles during training, to different initial conditions, thousands of timesteps, and at least an order of magnitude more particles at test time.
arXiv Detail & Related papers (2020-02-21T16:44:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.