An adaptive sampling algorithm for data-generation to build a data-manifold for physical problem surrogate modeling
- URL: http://arxiv.org/abs/2505.08487v1
- Date: Tue, 13 May 2025 12:17:10 GMT
- Title: An adaptive sampling algorithm for data-generation to build a data-manifold for physical problem surrogate modeling
- Authors: Chetra Mang, Axel TahmasebiMoradi, David Danan, Mouadh Yagoubi,
- Abstract summary: We present an Adaptive Sampling Algorithm for Data Generation (ASADG) involving a physical model.<n>We demonstrate the efficiency of the data sampling algorithm in comparison with LHS method for generating more representative input data.
- Score: 0.7499722271664147
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Physical models classically involved Partial Differential equations (PDE) and depending of their underlying complexity and the level of accuracy required, and known to be computationally expensive to numerically solve them. Thus, an idea would be to create a surrogate model relying on data generated by such solver. However, training such a model on an imbalanced data have been shown to be a very difficult task. Indeed, if the distribution of input leads to a poor response manifold representation, the model may not learn well and consequently, it may not predict the outcome with acceptable accuracy. In this work, we present an Adaptive Sampling Algorithm for Data Generation (ASADG) involving a physical model. As the initial input data may not accurately represent the response manifold in higher dimension, this algorithm iteratively adds input data into it. At each step the barycenter of each simplicial complex, that the manifold is discretized into, is added as new input data, if a certain threshold is satisfied. We demonstrate the efficiency of the data sampling algorithm in comparison with LHS method for generating more representative input data. To do so, we focus on the construction of a harmonic transport problem metamodel by generating data through a classical solver. By using such algorithm, it is possible to generate the same number of input data as LHS while providing a better representation of the response manifold.
Related papers
- Method of data forward generation with partial differential equations for machine learning modeling in fluid mechanics [1.9688252014450927]
This study proposes a high-efficient data forward generation method from the partial differential equations (PDEs)<n>A Poisson neural network (Poisson-NN) embedded in projection method and a wavelet transform convolutional neuro network (WTCNN) embedded in multigrid numerical simulation for solving incompressible Navier-Stokes equations is respectively proposed.
arXiv Detail & Related papers (2025-01-06T15:17:13Z) - Modular Learning of Deep Causal Generative Models for High-dimensional Causal Inference [5.522612010562183]
Modular-DCM is the first algorithm that, given the causal structure, uses adversarial training to learn the network weights.
We show our algorithm's convergence on the COVIDx dataset and its utility with a causal invariant prediction problem on CelebA-HQ.
arXiv Detail & Related papers (2024-01-02T20:31:15Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Score-based Diffusion Models in Function Space [137.70916238028306]
Diffusion models have recently emerged as a powerful framework for generative modeling.<n>This work introduces a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.<n>We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - The data-driven physical-based equations discovery using evolutionary
approach [77.34726150561087]
We describe the algorithm for the mathematical equations discovery from the given observations data.
The algorithm combines genetic programming with the sparse regression.
It could be used for governing analytical equation discovery as well as for partial differential equations (PDE) discovery.
arXiv Detail & Related papers (2020-04-03T17:21:57Z) - Semi-analytic approximate stability selection for correlated data in
generalized linear models [3.42658286826597]
We propose a novel approximate inference algorithm that can conduct Stability Selection without the repeated fitting.
The algorithm is based on the replica method of statistical mechanics and vector approximate message passing of information theory.
Numerical experiments indicate that the algorithm exhibits fast convergence and high approximation accuracy for both synthetic and real-world data.
arXiv Detail & Related papers (2020-03-19T10:43:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.