Minimal Variance Model Aggregation: A principled, non-intrusive, and versatile integration of black box models
- URL: http://arxiv.org/abs/2409.17267v2
- Date: Mon, 03 Mar 2025 18:41:55 GMT
- Title: Minimal Variance Model Aggregation: A principled, non-intrusive, and versatile integration of black box models
- Authors: Théo Bourdais, Houman Owhadi,
- Abstract summary: We introduce Minimal Empirical Variance Aggregation (MEVA), a data-driven framework that integrates predictions from various models.<n>This non-intrusive, model-agnostic approach treats the contributing models as black boxes and accommodates outputs from diverse methodologies.
- Score: 0.2455468619225742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Whether deterministic or stochastic, models can be viewed as functions designed to approximate a specific quantity of interest. We introduce Minimal Empirical Variance Aggregation (MEVA), a data-driven framework that integrates predictions from various models, enhancing overall accuracy by leveraging the individual strengths of each. This non-intrusive, model-agnostic approach treats the contributing models as black boxes and accommodates outputs from diverse methodologies, including machine learning algorithms and traditional numerical solvers. We advocate for a point-wise linear aggregation process and consider two methods for optimizing this aggregate: Minimal Error Aggregation (MEA), which minimizes the prediction error, and Minimal Variance Aggregation (MVA), which focuses on reducing variance. We prove a theorem showing that MVA can be more robustly estimated from data than MEA, making MEVA superior to Minimal Empirical Error Aggregation (MEEA). Unlike MEEA, which interpolates target values directly, MEVA formulates aggregation as an error estimation problem, which can be performed using any backbone learning paradigm. We demonstrate the versatility and effectiveness of our framework across various applications, including data science and partial differential equations, illustrating its ability to significantly enhance both robustness and accuracy.
Related papers
- Joint Explainability-Performance Optimization With Surrogate Models for AI-Driven Edge Services [3.8688731303365533]
This paper explores the balance between the predictive accuracy of complex AI models and their approximation by surrogate ones.
We introduce a new algorithm based on multi-objective optimization (MOO) to simultaneously minimize both the complex model's prediction error and the error between its outputs and those of the surrogate.
arXiv Detail & Related papers (2025-03-10T19:04:09Z) - On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning [85.75164588939185]
We study the discriminative probabilistic modeling on a continuous domain for the data prediction task of (multimodal) self-supervised representation learning.
We conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning.
We propose a novel non-parametric method for approximating the sum of conditional probability densities required by MIS.
arXiv Detail & Related papers (2024-10-11T18:02:46Z) - MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)
MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved.
We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Towards Understanding Variants of Invariant Risk Minimization through the Lens of Calibration [0.6906005491572401]
We show that Information Bottleneck-based IRM achieves consistent calibration across different environments.
Our empirical evidence indicates that models exhibiting consistent calibration across environments are also well-calibrated.
arXiv Detail & Related papers (2024-01-31T02:08:43Z) - Maintaining Stability and Plasticity for Predictive Churn Reduction [8.971668467496055]
We propose a solution called Accumulated Model Combination (AMC)
AMC is a general technique and we propose several instances of it, each having their own advantages depending on the model and data properties.
arXiv Detail & Related papers (2023-05-06T20:56:20Z) - Mixed Semi-Supervised Generalized-Linear-Regression with Applications to Deep-Learning and Interpolators [6.537685198688539]
We present a methodology for using unlabeled data to design semi supervised learning (SSL) methods.
We include in each of them a mixing parameter $alpha$, controlling the weight given to the unlabeled data.
We demonstrate the effectiveness of our methodology in delivering substantial improvement compared to the standard supervised models.
arXiv Detail & Related papers (2023-02-19T09:55:18Z) - A prediction and behavioural analysis of machine learning methods for
modelling travel mode choice [0.26249027950824505]
We conduct a systematic comparison of different modelling approaches, across multiple modelling problems, in terms of the key factors likely to affect model choice.
Results indicate that the models with the highest disaggregate predictive performance provide poorer estimates of behavioural indicators and aggregate mode shares.
It is also observed that the MNL model performs robustly in a variety of situations, though ML techniques can improve the estimates of behavioural indices such as Willingness to Pay.
arXiv Detail & Related papers (2023-01-11T11:10:32Z) - Building Robust Machine Learning Models for Small Chemical Science Data:
The Case of Shear Viscosity [3.4761212729163313]
We train several Machine Learning models to predict the shear viscosity of a Lennard-Jones (LJ) fluid.
Specifically, the issues related to model selection, performance estimation and uncertainty quantification were investigated.
arXiv Detail & Related papers (2022-08-23T07:33:14Z) - RMFGP: Rotated Multi-fidelity Gaussian process with Dimension Reduction
for High-dimensional Uncertainty Quantification [12.826754199680474]
Multi-fidelity modelling enables accurate inference even when only a small set of accurate data is available.
By combining the realizations of the high-fidelity model with one or more low-fidelity models, the multi-fidelity method can make accurate predictions of quantities of interest.
This paper proposes a new dimension reduction framework based on rotated multi-fidelity Gaussian process regression and a Bayesian active learning scheme.
arXiv Detail & Related papers (2022-04-11T01:20:35Z) - Optimal Model Averaging: Towards Personalized Collaborative Learning [0.0]
In federated learning, differences in the data or objectives between the participating nodes motivate approaches to train a personalized machine learning model for each node.
One such approach is weighted averaging between a locally trained model and the global model.
We find that there is always some positive amount of model averaging that reduces the expected squared error compared to the local model.
arXiv Detail & Related papers (2021-10-25T13:33:20Z) - Information Theoretic Structured Generative Modeling [13.117829542251188]
A novel generative model framework called the structured generative model (SGM) is proposed that makes straightforward optimization possible.
The implementation employs a single neural network driven by an orthonormal input to a single white noise source adapted to learn an infinite Gaussian mixture model.
Preliminary results show that SGM significantly improves MINE estimation in terms of data efficiency and variance, conventional and variational Gaussian mixture models, as well as for training adversarial networks.
arXiv Detail & Related papers (2021-10-12T07:44:18Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.