Prediction Sets for High-Dimensional Mixture of Experts Models
- URL: http://arxiv.org/abs/2210.16710v1
- Date: Sun, 30 Oct 2022 00:27:19 GMT
- Title: Prediction Sets for High-Dimensional Mixture of Experts Models
- Authors: Adel Javanmard, Simeng Shao, Jacob Bien
- Abstract summary: We show how to construct valid prediction sets for an $ell_$-penalized mixture of experts model in the high-dimensional setting.
We make use of a debiasing procedure to account for the bias induced by the penalization and propose a novel strategy for combining intervals to form a prediction set.
- Score: 9.195729979000404
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large datasets make it possible to build predictive models that can capture
heterogenous relationships between the response variable and features. The
mixture of high-dimensional linear experts model posits that observations come
from a mixture of high-dimensional linear regression models, where the mixture
weights are themselves feature-dependent. In this paper, we show how to
construct valid prediction sets for an $\ell_1$-penalized mixture of experts
model in the high-dimensional setting. We make use of a debiasing procedure to
account for the bias induced by the penalization and propose a novel strategy
for combining intervals to form a prediction set with coverage guarantees in
the mixture setting. Synthetic examples and an application to the prediction of
critical temperatures of superconducting materials show our method to have
reliable practical performance.
Related papers
- H-AddiVortes: Heteroscedastic (Bayesian) Additive Voronoi Tessellations [0.0]
The Heteroscedastic AddiVortes model simultaneously models the conditional mean and variance of a response variable.
By employing a sum-of-tessellations approach for the mean and a product-of-tessellations approach for the variance, the model provides a flexible and interpretable means to capture complex, predictor-dependent relationships.
arXiv Detail & Related papers (2025-03-17T10:41:31Z) - Optimizing Pre-Training Data Mixtures with Mixtures of Data Expert Models [24.396525123797073]
We propose a method to optimize language model pre-training data mixtures through efficient approximation of the cross-entropy loss corresponding to each candidate mixture.
We use this approximation as a source of additional features in a regression model, trained from observations of model loss for a small number of mixtures.
arXiv Detail & Related papers (2025-02-21T21:27:48Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [55.872926690722714]
We study the predictability of model performance regarding the mixture proportions in function forms.
We propose nested use of the scaling laws of training steps, model sizes, and our data mixing law.
Our method effectively optimize the training mixture of a 1B model trained for 100B tokens in RedPajama.
arXiv Detail & Related papers (2024-03-25T17:14:00Z) - Fusion of Gaussian Processes Predictions with Monte Carlo Sampling [61.31380086717422]
In science and engineering, we often work with models designed for accurate prediction of variables of interest.
Recognizing that these models are approximations of reality, it becomes desirable to apply multiple models to the same data and integrate their outcomes.
arXiv Detail & Related papers (2024-03-03T04:21:21Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Local Bayesian Dirichlet mixing of imperfect models [0.0]
We study the ability of Bayesian model averaging and mixing techniques to mine nuclear masses.
We show that the global and local mixtures of models reach excellent performance on both prediction accuracy and uncertainty quantification.
arXiv Detail & Related papers (2023-11-02T21:02:40Z) - Sharing Information Between Machine Tools to Improve Surface Finish
Forecasting [0.0]
The authors propose a Bayesian hierarchical model to predict surface-roughness measurements for a turning machining process.
The hierarchical model is compared to multiple independent Bayesian linear regression models to showcase the benefits of partial pooling in a machining setting.
arXiv Detail & Related papers (2023-10-09T15:44:35Z) - Differentiating Metropolis-Hastings to Optimize Intractable Densities [51.16801956665228]
We develop an algorithm for automatic differentiation of Metropolis-Hastings samplers.
We apply gradient-based optimization to objectives expressed as expectations over intractable target densities.
arXiv Detail & Related papers (2023-06-13T17:56:02Z) - Bayesian Sparse Regression for Mixed Multi-Responses with Application to
Runtime Metrics Prediction in Fog Manufacturing [6.288767115532775]
Fog manufacturing can greatly enhance traditional manufacturing systems through distributed computation Fog units.
It is known that the predictive offloading methods highly depend on accurate prediction and uncertainty quantification of runtime performance metrics.
We propose a Bayesian sparse regression for multivariate mixed responses to enhance the prediction of runtime performance metrics.
arXiv Detail & Related papers (2022-10-10T16:14:08Z) - Time varying regression with hidden linear dynamics [74.9914602730208]
We revisit a model for time-varying linear regression that assumes the unknown parameters evolve according to a linear dynamical system.
Counterintuitively, we show that when the underlying dynamics are stable the parameters of this model can be estimated from data by combining just two ordinary least squares estimates.
arXiv Detail & Related papers (2021-12-29T23:37:06Z) - An Extended Multi-Model Regression Approach for Compressive Strength
Prediction and Optimization of a Concrete Mixture [0.0]
A model based evaluation of concrete compressive strength is of high value, both for the purpose of strength prediction and the mixture optimization.
We take a further step towards improving the accuracy of the prediction model via the weighted combination of multiple regression methods.
A proposed (GA)-based mixture optimization is proposed, building on the obtained multi-regression model.
arXiv Detail & Related papers (2021-06-13T16:10:32Z) - A similarity-based Bayesian mixture-of-experts model [0.5156484100374058]
We present a new non-parametric mixture-of-experts model for multivariate regression problems.
Using a conditionally specified model, predictions for out-of-sample inputs are based on similarities to each observed data point.
Posterior inference is performed on the parameters of the mixture as well as the distance metric.
arXiv Detail & Related papers (2020-12-03T18:08:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.