A non-asymptotic penalization criterion for model selection in mixture
of experts models
- URL: http://arxiv.org/abs/2104.02640v1
- Date: Tue, 6 Apr 2021 16:24:55 GMT
- Title: A non-asymptotic penalization criterion for model selection in mixture
of experts models
- Authors: TrungTin Nguyen, Hien Duy Nguyen, Faicel Chamroukhi and Florence
Forbes
- Abstract summary: We consider the Gaussian-gated localized MoE (GLoME) regression model for modeling heterogeneous data.
This model poses challenging questions with respect to the statistical estimation and model selection problems.
We study the problem of estimating the number of components of the GLoME model, in a penalized maximum likelihood estimation framework.
- Score: 1.491109220586182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixture of experts (MoE) is a popular class of models in statistics and
machine learning that has sustained attention over the years, due to its
flexibility and effectiveness. We consider the Gaussian-gated localized MoE
(GLoME) regression model for modeling heterogeneous data. This model poses
challenging questions with respect to the statistical estimation and model
selection problems, including feature selection, both from the computational
and theoretical points of view. We study the problem of estimating the number
of components of the GLoME model, in a penalized maximum likelihood estimation
framework. We provide a lower bound on the penalty that ensures a weak oracle
inequality is satisfied by our estimator. To support our theoretical result, we
perform numerical experiments on simulated and real data, which illustrate the
performance of our finite-sample oracle inequality.
Related papers
- Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.
We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.
Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - On conditional diffusion models for PDE simulations [53.01911265639582]
We study score-based diffusion models for forecasting and assimilation of sparse observations.
We propose an autoregressive sampling approach that significantly improves performance in forecasting.
We also propose a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths.
arXiv Detail & Related papers (2024-10-21T18:31:04Z) - Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling.
Our research explores task-specific model pruning to inform decisions about designing SMoE architectures.
We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z) - On Least Square Estimation in Softmax Gating Mixture of Experts [78.3687645289918]
We investigate the performance of the least squares estimators (LSE) under a deterministic MoE model.
We establish a condition called strong identifiability to characterize the convergence behavior of various types of expert functions.
Our findings have important practical implications for expert selection.
arXiv Detail & Related papers (2024-02-05T12:31:18Z) - fairml: A Statistician's Take on Fair Machine Learning Modelling [0.0]
We describe the fairml package which implements our previous work (Scutari, Panero, and Proissl 2022) and related models in the literature.
fairml is designed around classical statistical models and penalised regression results.
The constraint used to enforce fairness is to model estimation, making it possible to mix-and-match the desired model family and fairness definition for each application.
arXiv Detail & Related papers (2023-05-03T09:59:53Z) - A prediction and behavioural analysis of machine learning methods for
modelling travel mode choice [0.26249027950824505]
We conduct a systematic comparison of different modelling approaches, across multiple modelling problems, in terms of the key factors likely to affect model choice.
Results indicate that the models with the highest disaggregate predictive performance provide poorer estimates of behavioural indicators and aggregate mode shares.
It is also observed that the MNL model performs robustly in a variety of situations, though ML techniques can improve the estimates of behavioural indices such as Willingness to Pay.
arXiv Detail & Related papers (2023-01-11T11:10:32Z) - Empirical Analysis of Model Selection for Heterogeneous Causal Effect Estimation [24.65301562548798]
We study the problem of model selection in causal inference, specifically for conditional average treatment effect (CATE) estimation.
We conduct an empirical analysis to benchmark the surrogate model selection metrics introduced in the literature, as well as the novel ones introduced in this work.
arXiv Detail & Related papers (2022-11-03T16:26:06Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - On Statistical Efficiency in Learning [37.08000833961712]
We address the challenge of model selection to strike a balance between model fitting and model complexity.
We propose an online algorithm that sequentially expands the model complexity to enhance selection stability and reduce cost.
Experimental studies show that the proposed method has desirable predictive power and significantly less computational cost than some popular methods.
arXiv Detail & Related papers (2020-12-24T16:08:29Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.