Scaling Law of Sim2Real Transfer Learning in Expanding Computational Materials Databases for Real-World Predictions
- URL: http://arxiv.org/abs/2408.04042v1
- Date: Wed, 7 Aug 2024 18:47:58 GMT
- Title: Scaling Law of Sim2Real Transfer Learning in Expanding Computational Materials Databases for Real-World Predictions
- Authors: Shunya Minami, Yoshihiro Hayashi, Stephen Wu, Kenji Fukumizu, Hiroki Sugisawa, Masashi Ishii, Isao Kuwajima, Kazuya Shiratori, Ryo Yoshida,
- Abstract summary: Fine-tuning a predictor pretrained on a computational database to a real system can result in models with outstanding generalization capabilities.
This study demonstrates the scaling law of simulation-to-real (Sim2Real) transfer learning for several machine learning tasks in materials science.
- Score: 13.20562263181952
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To address the challenge of limited experimental materials data, extensive physical property databases are being developed based on high-throughput computational experiments, such as molecular dynamics simulations. Previous studies have shown that fine-tuning a predictor pretrained on a computational database to a real system can result in models with outstanding generalization capabilities compared to learning from scratch. This study demonstrates the scaling law of simulation-to-real (Sim2Real) transfer learning for several machine learning tasks in materials science. Case studies of three prediction tasks for polymers and inorganic materials reveal that the prediction error on real systems decreases according to a power-law as the size of the computational data increases. Observing the scaling behavior offers various insights for database development, such as determining the sample size necessary to achieve a desired performance, identifying equivalent sample sizes for physical and computational experiments, and guiding the design of data production protocols for downstream real-world tasks.
Related papers
- Discovering Interpretable Physical Models using Symbolic Regression and
Discrete Exterior Calculus [55.2480439325792]
We propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models.
DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems.
We prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data.
arXiv Detail & Related papers (2023-10-10T13:23:05Z) - Addressing computational challenges in physical system simulations with
machine learning [0.0]
We present a machine learning-based data generator framework tailored to aid researchers who utilize simulations to examine various physical systems or processes.
Our approach involves a two-step process: first, we train a supervised predictive model using a limited simulated dataset to predict simulation outcomes.
Subsequently, a reinforcement learning agent is trained to generate accurate, simulation-like data by leveraging the supervised model.
arXiv Detail & Related papers (2023-05-16T17:31:50Z) - An Adversarial Active Sampling-based Data Augmentation Framework for
Manufacturable Chip Design [55.62660894625669]
Lithography modeling is a crucial problem in chip design to ensure a chip design mask is manufacturable.
Recent developments in machine learning have provided alternative solutions in replacing the time-consuming lithography simulations with deep neural networks.
We propose a litho-aware data augmentation framework to resolve the dilemma of limited data and improve the machine learning model performance.
arXiv Detail & Related papers (2022-10-27T20:53:39Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - Physical Systems Modeled Without Physical Laws [0.0]
Tree-based machine learning methods can emulate desired outputs without "knowing" the complex backing involved in the simulations.
We specifically focus on predicting specific spatial-temporal data between two simulation outputs and increasing spatial resolution to generalize the physics predictions to finer test grids without the computational costs of repeating the numerical calculation.
arXiv Detail & Related papers (2022-07-26T20:51:20Z) - A Scaling Law for Synthetic-to-Real Transfer: A Measure of Pre-Training [52.93808218720784]
Synthetic-to-real transfer learning is a framework in which we pre-train models with synthetically generated images and ground-truth annotations for real tasks.
Although synthetic images overcome the data scarcity issue, it remains unclear how the fine-tuning performance scales with pre-trained models.
We observe a simple and general scaling law that consistently describes learning curves in various tasks, models, and complexities of synthesized pre-training data.
arXiv Detail & Related papers (2021-08-25T02:29:28Z) - Cognitive simulation models for inertial confinement fusion: Combining
simulation and experimental data [0.0]
Researchers rely heavily on computer simulations to explore the design space in search of high-performing implosions.
For more effective design and investigation, simulations require input from past experimental data to better predict future performance.
We describe a cognitive simulation method for combining simulation and experimental data into a common, predictive model.
arXiv Detail & Related papers (2021-03-19T02:00:14Z) - Integrating Machine Learning with HPC-driven Simulations for Enhanced
Student Learning [0.0]
We develop a web application that supports both HPC-driven simulation and the ML surrogate methods to produce simulation outputs.
The evaluation of the tool via in-classroom student feedback and surveys shows that the ML-enhanced tool provides a dynamic and responsive simulation environment.
arXiv Detail & Related papers (2020-08-24T22:48:21Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z) - Improving neural network predictions of material properties with limited
data using transfer learning [3.2851683371946754]
We develop new transfer learning algorithms to accelerate prediction of material properties from ab initio simulations.
Transfer learning has been successfully utilized for data-efficient modeling in applications other than materials science.
arXiv Detail & Related papers (2020-06-29T22:34:30Z) - Predictive modeling approaches in laser-based material processing [59.04160452043105]
This study aims to automate and forecast the effect of laser processing on material structures.
The focus is centred on the performance of representative statistical and machine learning algorithms.
Results can set the basis for a systematic methodology towards reducing material design, testing and production cost.
arXiv Detail & Related papers (2020-06-13T17:28:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.