Component-wise Adaptive Trimming For Robust Mixture Regression
- URL: http://arxiv.org/abs/2005.11599v3
- Date: Mon, 19 Apr 2021 15:54:18 GMT
- Title: Component-wise Adaptive Trimming For Robust Mixture Regression
- Authors: Wennan Chang, Xinyu Zhou, Yong Zang, Chi Zhang, Sha Cao
- Abstract summary: Existing robust mixture regression methods suffer from outliers as they either conduct outlier estimation in presence of outliers, or rely on prior knowledge of the level of contamination.
Here we propose a fast and efficient robust mixture regression algorithm called adaptive-wise Adaptive Component (CAT) method.
- Score: 15.633993488010292
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Parameter estimation of mixture regression model using the expectation
maximization (EM) algorithm is highly sensitive to outliers. Here we propose a
fast and efficient robust mixture regression algorithm, called Component-wise
Adaptive Trimming (CAT) method. We consider simultaneous outlier detection and
robust parameter estimation to minimize the effect of outlier contamination.
Robust mixture regression has many important applications including in human
cancer genomics data, where the population often displays strong heterogeneity
added by unwanted technological perturbations. Existing robust mixture
regression methods suffer from outliers as they either conduct parameter
estimation in the presence of outliers, or rely on prior knowledge of the level
of outlier contamination. CAT was implemented in the framework of
classification expectation maximization, under which a natural definition of
outliers could be derived. It implements a least trimmed squares (LTS) approach
within each exclusive mixing component, where the robustness issue could be
transformed from the mixture case to simple linear regression case. The high
breakdown point of the LTS approach allows us to avoid the pre-specification of
trimming parameter. Compared with multiple existing algorithms, CAT is the most
competitive one that can handle and adaptively trim off outliers as well as
heavy tailed noise, in different scenarios of simulated data and real genomic
data. CAT has been implemented in an R package `RobMixReg' available in CRAN.
Related papers
- Semiparametric conformal prediction [79.6147286161434]
Risk-sensitive applications require well-calibrated prediction sets over multiple, potentially correlated target variables.
We treat the scores as random vectors and aim to construct the prediction set accounting for their joint correlation structure.
We report desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Error Reduction from Stacked Regressions [12.657895453939298]
Stacking regressions is an ensemble technique that forms linear combinations of different regression estimators to enhance predictive accuracy.
In this paper, we learn these weights analogously by minimizing a regularized version of the empirical risk subject to a nonnegativity constraint.
Thanks to an adaptive shrinkage effect, the resulting stacked estimator has strictly smaller population risk than best single estimator among them.
arXiv Detail & Related papers (2023-09-18T15:42:12Z) - Stability-Adjusted Cross-Validation for Sparse Linear Regression [5.156484100374059]
Cross-validation techniques like k-fold cross-validation substantially increase the computational cost of sparse regression.
We propose selecting hyper parameters that minimize a weighted sum of a cross-validation metric and a model's output stability.
Our confidence adjustment procedure reduces test set error by 2%, on average, on 13 real-world datasets.
arXiv Detail & Related papers (2023-06-26T17:02:45Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - A Statistics and Deep Learning Hybrid Method for Multivariate Time
Series Forecasting and Mortality Modeling [0.0]
Exponential Smoothing Recurrent Neural Network (ES-RNN) is a hybrid between a statistical forecasting model and a recurrent neural network variant.
ES-RNN achieves a 9.4% improvement in absolute error in the Makridakis-4 Forecasting Competition.
arXiv Detail & Related papers (2021-12-16T04:44:19Z) - Gaining Outlier Resistance with Progressive Quantiles: Fast Algorithms
and Theoretical Studies [1.6457778420360534]
A framework of outlier-resistant estimation is introduced to robustify arbitrarily loss function.
A new technique is proposed to alleviate the requirement on starting point such that on regular datasets the number of data reestimations can be substantially reduced.
The obtained estimators, though not necessarily globally or even globally, enjoymax optimality in both low dimensions.
arXiv Detail & Related papers (2021-12-15T20:35:21Z) - Risk Bounds for Over-parameterized Maximum Margin Classification on
Sub-Gaussian Mixtures [100.55816326422773]
We study the phenomenon of the maximum margin classifier for linear classification problems.
Our results precisely characterize the condition under which benign overfitting can occur.
arXiv Detail & Related papers (2021-04-28T08:25:16Z) - Robust regression with covariate filtering: Heavy tails and adversarial
contamination [6.939768185086755]
We show how to modify the Huber regression, least trimmed squares, and least absolute deviation estimators to obtain estimators simultaneously computationally and statistically efficient in the stronger contamination model.
We show that the Huber regression estimator achieves near-optimal error rates in this setting, whereas the least trimmed squares and least absolute deviation estimators can be made to achieve near-optimal error after applying a postprocessing step.
arXiv Detail & Related papers (2020-09-27T22:48:48Z) - Fast OSCAR and OWL Regression via Safe Screening Rules [97.28167655721766]
Ordered $L_1$ (OWL) regularized regression is a new regression analysis for high-dimensional sparse learning.
Proximal gradient methods are used as standard approaches to solve OWL regression.
We propose the first safe screening rule for OWL regression by exploring the order of the primal solution with the unknown order structure.
arXiv Detail & Related papers (2020-06-29T23:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.