A Multi-fidelity Double-Delta Wing Dataset and Empirical Scaling Laws for GNN-based Aerodynamic Field Surrogate
- URL: http://arxiv.org/abs/2512.20941v1
- Date: Wed, 24 Dec 2025 04:53:11 GMT
- Title: A Multi-fidelity Double-Delta Wing Dataset and Empirical Scaling Laws for GNN-based Aerodynamic Field Surrogate
- Authors: Yiren Shen, Juan J. Alonso,
- Abstract summary: This study investigates the relationship between training data size and prediction accuracy for a graph neural network based surrogate model.<n>We release an open-source, multi-fidelity aerodynamic dataset for double-delta wings.<n>We find that the test error decreases with data size with a power-law exponent of -0.6122, indicating efficient data utilization.
- Score: 0.10312968200748117
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Data-driven surrogate models are increasingly adopted to accelerate vehicle design. However, open-source multi-fidelity datasets and empirical guidelines linking dataset size to model performance remain limited. This study investigates the relationship between training data size and prediction accuracy for a graph neural network (GNN) based surrogate model for aerodynamic field prediction. We release an open-source, multi-fidelity aerodynamic dataset for double-delta wings, comprising 2448 flow snapshots across 272 geometries evaluated at angles of attack from 11 (degree) to 19 (degree) at Ma=0.3 using both Vortex Lattice Method (VLM) and Reynolds-Averaged Navier-Stokes (RANS) solvers. The geometries are generated using a nested Saltelli sampling scheme to support future dataset expansion and variance-based sensitivity analysis. Using this dataset, we conduct a preliminary empirical scaling study of the MF-VortexNet surrogate by constructing six training datasets with sizes ranging from 40 to 1280 snapshots and training models with 0.1 to 2.4 million parameters under a fixed training budget. We find that the test error decreases with data size with a power-law exponent of -0.6122, indicating efficient data utilization. Based on this scaling law, we estimate that the optimal sampling density is approximately eight samples per dimension in a d-dimensional design space. The results also suggest improved data utilization efficiency for larger surrogate models, implying a potential trade-off between dataset generation cost and model training budget.
Related papers
- BlendedNet++: A Large-Scale Blended Wing Body Aerodynamics Dataset and Benchmark [4.128170991070137]
We introduce BlendedNet++, a large-scale aerodynamic dataset and benchmark focused on blended wing body (BWB) aircraft.<n>The dataset contains over 12,000 unique geometries, each simulated at a single flight condition, yielding 12,490 aerodynamic results for steady RANS CFD.<n>Using this dataset, we standardize a forward-surrogate benchmark to predict pointwise fields across six model families.<n>We present an inverse design task of achieving a specified lift-to-drag ratio under fixed flight conditions, implemented via a conditional diffusion model.
arXiv Detail & Related papers (2025-12-02T22:39:07Z) - Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z) - FoilDiff: A Hybrid Transformer Backbone for Diffusion-based Modelling of 2D Airfoil Flow Fields [1.5749416770494706]
FoilDiff is a diffusion-based surrogate model with a hybrid-backbone denoising network.<n>It can provide both more accurate predictions and better-calibrated predictive uncertainty than existing diffusion-based models.
arXiv Detail & Related papers (2025-10-05T19:10:38Z) - TGLF-SINN: Deep Learning Surrogate Model for Accelerating Turbulent Transport Modeling in Fusion [18.028061388104963]
We propose textbfTGLF-SINN (Spectra-Informed Neural Network) with three key innovations.<n>Our approach achieves superior performance with significantly less training data.<n>In downstream flux matching applications, our NN surrogate provides 45x speedup over TGLF while maintaining comparable accuracy.
arXiv Detail & Related papers (2025-09-07T09:36:51Z) - Fusing CFD and measurement data using transfer learning [49.1574468325115]
We introduce a non-linear method based on neural networks combining simulation and measurement data via transfer learning.<n>In a first step, the neural network is trained on simulation data to learn spatial features of the distributed quantities.<n>The second step involves transfer learning on the measurement data to correct for systematic errors between simulation and measurement by only re-training a small subset of the entire neural network model.
arXiv Detail & Related papers (2025-07-28T07:21:46Z) - Efficient Federated Learning with Heterogeneous Data and Adaptive Dropout [62.73150122809138]
Federated Learning (FL) is a promising distributed machine learning approach that enables collaborative training of a global model using multiple edge devices.<n>We propose the FedDHAD FL framework, which comes with two novel methods: Dynamic Heterogeneous model aggregation (FedDH) and Adaptive Dropout (FedAD)<n>The combination of these two methods makes FedDHAD significantly outperform state-of-the-art solutions in terms of accuracy (up to 6.7% higher), efficiency (up to 2.02 times faster), and cost (up to 15.0% smaller)
arXiv Detail & Related papers (2025-07-14T16:19:00Z) - Benchmarking Convolutional Neural Network and Graph Neural Network based Surrogate Models on a Real-World Car External Aerodynamics Dataset [0.0]
We present a comparative evaluation of two surrogate modeling approaches for predicting drag on a real-world dataset.<n>Our results show that the CNN-based method achieves a mean absolute error of 2.3 drag counts, while the GNN-based method achieves 3.8.<n>While both methods effectively capture the broader trends between baseline groups, they struggle to varying extents in capturing the finer intra-baseline group variations.
arXiv Detail & Related papers (2025-04-09T09:04:59Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Pre-training on Synthetic Driving Data for Trajectory Prediction [61.520225216107306]
We propose a pipeline-level solution to mitigate the issue of data scarcity in trajectory forecasting.
We adopt HD map augmentation and trajectory synthesis for generating driving data, and then we learn representations by pre-training on them.
We conduct extensive experiments to demonstrate the effectiveness of our data expansion and pre-training strategies.
arXiv Detail & Related papers (2023-09-18T19:49:22Z) - Accurate deep learning sub-grid scale models for large eddy simulations [0.0]
We present two families of sub-grid scale (SGS) turbulence models developed for large-eddy simulation (LES) purposes.
Their development required the formulation of physics-informed robust and efficient Deep Learning (DL) algorithms.
Explicit filtering of data from direct simulations of canonical channel flow at two friction Reynolds numbers provided accurate data for training and testing.
arXiv Detail & Related papers (2023-07-19T15:30:06Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.