TwinLab: a framework for data-efficient training of non-intrusive reduced-order models for digital twins
- URL: http://arxiv.org/abs/2407.03924v1
- Date: Thu, 4 Jul 2024 13:37:13 GMT
- Title: TwinLab: a framework for data-efficient training of non-intrusive reduced-order models for digital twins
- Authors: Maximilian Kannapinn, Michael Schäfer, Oliver Weeger,
- Abstract summary: This study presents TwinLab, a framework for data-efficient, yet accurate training of neural-ODE type reduced-order models with only two data sets.
Having found the single best data sets for training, a second data set is sought with the help of similarity and error measures to enrich the training process effectively.
Findings: Adding a suitable second training data set in the training process reduces the test error by up to 49% compared to the best base reduced-order model trained only with one data set.
- Score: 0.7863035464831994
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Purpose: Simulation-based digital twins represent an effort to provide high-accuracy real-time insights into operational physical processes. However, the computation time of many multi-physical simulation models is far from real-time. It might even exceed sensible time frames to produce sufficient data for training data-driven reduced-order models. This study presents TwinLab, a framework for data-efficient, yet accurate training of neural-ODE type reduced-order models with only two data sets. Design/methodology/approach: Correlations between test errors of reduced-order models and distinct features of corresponding training data are investigated. Having found the single best data sets for training, a second data set is sought with the help of similarity and error measures to enrich the training process effectively. Findings: Adding a suitable second training data set in the training process reduces the test error by up to 49% compared to the best base reduced-order model trained only with one data set. Such a second training data set should at least yield a good reduced-order model on its own and exhibit higher levels of dissimilarity to the base training data set regarding the respective excitation signal. Moreover, the base reduced-order model should have elevated test errors on the second data set. The relative error of the time series ranges from 0.18% to 0.49%. Prediction speed-ups of up to a factor of 36,000 are observed. Originality: The proposed computational framework facilitates the automated, data-efficient extraction of non-intrusive reduced-order models for digital twins from existing simulation models, independent of the simulation software.
Related papers
- Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages.
Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z) - Efficient Online Data Mixing For Language Model Pre-Training [101.45242332613944]
Existing data selection methods suffer from slow and computationally expensive processes.
Data mixing, on the other hand, reduces the complexity of data selection by grouping data points together.
We develop an efficient algorithm for Online Data Mixing (ODM) that combines elements from both data selection and data mixing.
arXiv Detail & Related papers (2023-12-05T00:42:35Z) - Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - On minimizing the training set fill distance in machine learning regression [0.552480439325792]
We study a data selection approach that aims to minimize the fill distance of the selected set.
We show that selecting training sets with the FPS can also increase model stability for the specific case of Gaussian kernel regression approaches.
arXiv Detail & Related papers (2023-07-20T16:18:33Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - CLIP: Train Faster with Less Data [3.2575001434344286]
Deep learning models require an enormous amount of data for training.
Recently there is a shift in machine learning from model-centric to data-centric approaches.
We propose CLIP i.e., Curriculum Learning with Iterative data Pruning.
arXiv Detail & Related papers (2022-12-02T21:29:48Z) - Physics-based Digital Twins for Autonomous Thermal Food Processing:
Efficient, Non-intrusive Reduced-order Modeling [0.0]
This paper proposes a physics-based, data-driven Digital Twin framework for autonomous food processing.
A correlation between a high standard deviation of the surface temperatures in the training data and a low root mean square error in ROM testing enables efficient selection of training data.
arXiv Detail & Related papers (2022-09-07T10:58:38Z) - Data Aggregation for Reducing Training Data in Symbolic Regression [0.0]
This work discusses methods to reduce the training data and thereby also the runtime of genetic programming.
K-means clustering and data binning is used for data aggregation and compared with random sampling as the simplest data reduction method.
The performance of genetic programming is compared with random forests and linear regression.
arXiv Detail & Related papers (2021-08-24T11:58:17Z) - Self-Supervised Pretraining Improves Self-Supervised Pretraining [83.1423204498361]
Self-supervised pretraining requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation.
This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model.
We show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data.
arXiv Detail & Related papers (2021-03-23T17:37:51Z) - Hybrid modeling: Applications in real-time diagnosis [64.5040763067757]
We outline a novel hybrid modeling approach that combines machine learning inspired models and physics-based models.
We are using such models for real-time diagnosis applications.
arXiv Detail & Related papers (2020-03-04T00:44:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.