Related papers: Projection-based multifidelity linear regression for data-scarce applications

Projection-based multifidelity linear regression for data-scarce applications

URL: http://arxiv.org/abs/2508.08517v1
Date: Mon, 11 Aug 2025 22:55:04 GMT
Title: Projection-based multifidelity linear regression for data-scarce applications
Authors: Vignesh Sella, Julie Pham, Karen Willcox, Anirban Chaudhuri,
Abstract summary: This work develops multifidelity methods for multiple-input multiple-output linear regression targeting data-limited applications with high-dimensional outputs.<n>The proposed multifidelity linear regression methods are demonstrated on approximating the surface pressure field of a hypersonic vehicle in flight.
Score: 0.1874930567916036
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Surrogate modeling for systems with high-dimensional quantities of interest remains challenging, particularly when training data are costly to acquire. This work develops multifidelity methods for multiple-input multiple-output linear regression targeting data-limited applications with high-dimensional outputs. Multifidelity methods integrate many inexpensive low-fidelity model evaluations with limited, costly high-fidelity evaluations. We introduce two projection-based multifidelity linear regression approaches that leverage principal component basis vectors for dimensionality reduction and combine multifidelity data through: (i) a direct data augmentation using low-fidelity data, and (ii) a data augmentation incorporating explicit linear corrections between low-fidelity and high-fidelity data. The data augmentation approaches combine high-fidelity and low-fidelity data into a unified training set and train the linear regression model through weighted least squares with fidelity-specific weights. Various weighting schemes and their impact on regression accuracy are explored. The proposed multifidelity linear regression methods are demonstrated on approximating the surface pressure field of a hypersonic vehicle in flight. In a low-data regime of no more than ten high-fidelity samples, multifidelity linear regression achieves approximately 3% - 12% improvement in median accuracy compared to single-fidelity methods with comparable computational cost.

Related papers

Efficient Federated Learning with Heterogeneous Data and Adaptive Dropout [62.73150122809138]
Federated Learning (FL) is a promising distributed machine learning approach that enables collaborative training of a global model using multiple edge devices.<n>We propose the FedDHAD FL framework, which comes with two novel methods: Dynamic Heterogeneous model aggregation (FedDH) and Adaptive Dropout (FedAD)<n>The combination of these two methods makes FedDHAD significantly outperform state-of-the-art solutions in terms of accuracy (up to 6.7% higher), efficiency (up to 2.02 times faster), and cost (up to 15.0% smaller)
arXiv Detail & Related papers (2025-07-14T16:19:00Z)
In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention [52.159541540613915]
We study how multi-head softmax attention models are trained to perform in-context learning on linear data.<n>Our results reveal that in-context learning ability emerges from the trained transformer as an aggregated effect of its architecture and the underlying data distribution.
arXiv Detail & Related papers (2025-03-17T02:00:49Z)
Provably Efficient Online RLHF with One-Pass Reward Modeling [59.30310692855397]
We propose a one-pass reward modeling method that does not require storing the historical data and can be computed in constant time.<n>We provide theoretical guarantees showing that our method improves both statistical and computational efficiency.<n>We conduct experiments using Llama-3-8B-Instruct and Qwen2.5-7B-Instruct models on the Ultrafeedback-binarized and Mixture2 datasets.
arXiv Detail & Related papers (2025-02-11T02:36:01Z)
Optimizing Pretraining Data Mixtures with LLM-Estimated Utility [52.08428597962423]
Large Language Models improve with increasing amounts of high-quality training data.<n>We find token-counts outperform manual and learned mixes, indicating that simple approaches for dataset size and diversity are surprisingly effective.<n>We propose two complementary approaches: UtiliMax, which extends token-based $200s by incorporating utility estimates from reduced-scale ablations, achieving up to a 10.6x speedup over manual baselines; and Model Estimated Data Utility (MEDU), which leverages LLMs to estimate data utility from small samples, matching ablation-based performance while reducing computational requirements by $simx.
arXiv Detail & Related papers (2025-01-20T21:10:22Z)
Practical multi-fidelity machine learning: fusion of deterministic and Bayesian models [0.34592277400656235]
Multi-fidelity machine learning methods integrate scarce, resource-intensive high-fidelity data with abundant but less accurate low-fidelity data.<n>We propose a practical multi-fidelity strategy for problems spanning low- and high-dimensional domains.
arXiv Detail & Related papers (2024-07-21T10:40:50Z)
Multifidelity Surrogate Models: A New Data Fusion Perspective [0.0]
Multifidelity surrogate modelling combines data of varying accuracy and cost from different sources. It strategically uses low-fidelity models for rapid evaluations, saving computational resources. It improves decision-making by addressing uncertainties and surpassing the limits of single-fidelity models.
arXiv Detail & Related papers (2024-04-21T11:21:47Z)
Multifidelity linear regression for scientific machine learning from scarce data [0.0]
We propose a new multifidelity training approach for scientific machine learning via linear regression. We provide bias and variance analysis of our new estimators that guarantee the approach's accuracy and improved robustness to scarce high-fidelity data.
arXiv Detail & Related papers (2024-03-13T15:40:17Z)
Disentangled Multi-Fidelity Deep Bayesian Active Learning [19.031567953748453]
Multi-fidelity active learning aims to learn a direct mapping from input parameters to simulation outputs at the highest fidelity. Deep learning-based methods often impose a hierarchical structure in hidden representations, which only supports passing information from low-fidelity to high-fidelity. We propose a novel framework called Disentangled Multi-fidelity Deep Bayesian Active Learning (D-MFDAL), which learns the surrogate models conditioned on the distribution of functions at multiple fidelities.
arXiv Detail & Related papers (2023-05-07T23:14:58Z)
Multi-fidelity Hierarchical Neural Processes [79.0284780825048]
Multi-fidelity surrogate modeling reduces the computational cost by fusing different simulation outputs. We propose Multi-fidelity Hierarchical Neural Processes (MF-HNP), a unified neural latent variable model for multi-fidelity surrogate modeling. We evaluate MF-HNP on epidemiology and climate modeling tasks, achieving competitive performance in terms of accuracy and uncertainty estimation.
arXiv Detail & Related papers (2022-06-10T04:54:13Z)
Multi-fidelity regression using artificial neural networks: efficient approximation of parameter-dependent output quantities [0.17499351967216337]
We present the use of artificial neural networks applied to multi-fidelity regression problems. The introduced models are compared against a traditional multi-fidelity scheme, co-kriging. We also show an application of multi-fidelity regression to an engineering problem.
arXiv Detail & Related papers (2021-02-26T11:29:00Z)
A Hypergradient Approach to Robust Regression without Correspondence [85.49775273716503]
We consider a variant of regression problem, where the correspondence between input and output data is not available. Most existing methods are only applicable when the sample size is small. We propose a new computational framework -- ROBOT -- for the shuffled regression problem.
arXiv Detail & Related papers (2020-11-30T21:47:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.