Strong identifiability and parameter learning in regression with
heterogeneous response
- URL: http://arxiv.org/abs/2212.04091v2
- Date: Sun, 28 Jan 2024 15:52:21 GMT
- Title: Strong identifiability and parameter learning in regression with
heterogeneous response
- Authors: Dat Do, Linh Do, XuanLong Nguyen
- Abstract summary: We investigate conditions of strong identifiability, rates of convergence for conditional density and parameter estimation, and the Bayesian posterior contraction behavior arising in finite mixture of regression models.
We provide simulation studies and data illustrations, which shed some light on the parameter learning behavior found in several popular regression mixture models reported in the literature.
- Score: 5.503319042839695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixtures of regression are a powerful class of models for regression learning
with respect to a highly uncertain and heterogeneous response variable of
interest. In addition to being a rich predictive model for the response given
some covariates, the parameters in this model class provide useful information
about the heterogeneity in the data population, which is represented by the
conditional distributions for the response given the covariates associated with
a number of distinct but latent subpopulations. In this paper, we investigate
conditions of strong identifiability, rates of convergence for conditional
density and parameter estimation, and the Bayesian posterior contraction
behavior arising in finite mixture of regression models, under exact-fitted and
over-fitted settings and when the number of components is unknown. This theory
is applicable to common choices of link functions and families of conditional
distributions employed by practitioners. We provide simulation studies and data
illustrations, which shed some light on the parameter learning behavior found
in several popular regression mixture models reported in the literature.
Related papers
- Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Multi-Response Heteroscedastic Gaussian Process Models and Their
Inference [1.52292571922932]
We propose a novel framework for the modeling of heteroscedastic covariance functions.
We employ variational inference to approximate the posterior and facilitate posterior predictive modeling.
We show that our proposed framework offers a robust and versatile tool for a wide array of applications.
arXiv Detail & Related papers (2023-08-29T15:06:47Z) - Modeling High-Dimensional Data with Unknown Cut Points: A Fusion
Penalized Logistic Threshold Regression [2.520538806201793]
In traditional logistic regression models, the link function is often assumed to be linear and continuous in predictors.
We consider a threshold model that all continuous features are discretized into ordinal levels, which further determine the binary responses.
We find the lasso model is well suited in the problem of early detection and prediction for chronic disease like diabetes.
arXiv Detail & Related papers (2022-02-17T04:16:40Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Spatially and Robustly Hybrid Mixture Regression Model for Inference of
Spatial Dependence [15.988679065054498]
We propose a Spatial Robust Mixture Regression model to investigate the relationship between a response variable and a set of explanatory variables over the spatial domain.
Our method integrates the robust finite mixture Gaussian regression model with spatial constraints, to simultaneously handle the spatial nonstationarity, local homogeneity, and outliers.
Experimental results on many synthetic and real-world datasets demonstrate the robustness, accuracy, and effectiveness of our proposed method.
arXiv Detail & Related papers (2021-09-01T16:29:46Z) - Convex Latent Effect Logit Model via Sparse and Low-rank Decomposition [2.1915057426589746]
We propose a convexparametric convexparametric formulation for learning logistic regression model (logit) with latent heterogeneous effect on sub-population.
Despite its popularity, the mixed logit approach for learning individual heterogeneity has several downsides.
arXiv Detail & Related papers (2021-08-22T22:23:39Z) - Variable selection with missing data in both covariates and outcomes:
Imputation and machine learning [1.0333430439241666]
The missing data issue is ubiquitous in health studies.
Machine learning methods weaken parametric assumptions.
XGBoost and BART have the overall best performance across various settings.
arXiv Detail & Related papers (2021-04-06T20:18:29Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Two-step penalised logistic regression for multi-omic data with an
application to cardiometabolic syndrome [62.997667081978825]
We implement a two-step approach to multi-omic logistic regression in which variable selection is performed on each layer separately.
Our approach should be preferred if the goal is to select as many relevant predictors as possible.
Our proposed approach allows us to identify features that characterise cardiometabolic syndrome at the molecular level.
arXiv Detail & Related papers (2020-08-01T10:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.