Sparse Linear Regression when Noises and Covariates are Heavy-Tailed and Contaminated by Outliers
- URL: http://arxiv.org/abs/2408.01336v1
- Date: Fri, 2 Aug 2024 15:33:04 GMT
- Title: Sparse Linear Regression when Noises and Covariates are Heavy-Tailed and Contaminated by Outliers
- Authors: Takeyuki Sasai, Hironori Fujisawa,
- Abstract summary: We investigate a problem estimating coefficients of linear regression under sparsity assumption.
We consider the situation where not only covariates and noises are sampled from heavy tailed distributions but also contaminated by outliers.
Our estimators can be computed efficiently, and exhibit sharp error bounds.
- Score: 2.0257616108612373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate a problem estimating coefficients of linear regression under sparsity assumption when covariates and noises are sampled from heavy tailed distributions. Additionally, we consider the situation where not only covariates and noises are sampled from heavy tailed distributions but also contaminated by outliers. Our estimators can be computed efficiently, and exhibit sharp error bounds.
Related papers
- Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z) - Robust Gaussian Process Regression with Huber Likelihood [2.7184224088243365]
We propose a robust process model in the Gaussian process framework with the likelihood of observed data expressed as the Huber probability distribution.
The proposed model employs weights based on projection statistics to scale residuals and bound the influence of vertical outliers and bad leverage points on the latent functions estimates.
arXiv Detail & Related papers (2023-01-19T02:59:33Z) - Outlier Robust and Sparse Estimation of Linear Regression Coefficients [2.0257616108612373]
We consider outlier-robust and sparse estimation of linear regression coefficients.
Our results present sharper error bounds under weaker assumptions than prior studies that share similar interests with this study.
arXiv Detail & Related papers (2022-08-24T14:56:52Z) - Robust and Sparse Estimation of Linear Regression Coefficients with
Heavy-tailed Noises and Covariates [0.0]
Our estimator can be computed efficiently. Further, our estimation error bound is sharp.
The situation addressed in this paper is that co variables and noises are sampled from heavy-tailed distributions, and the co variables and noises are contaminated by malicious outliers.
arXiv Detail & Related papers (2022-06-15T15:23:24Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Deconfounded Score Method: Scoring DAGs with Dense Unobserved
Confounding [101.35070661471124]
We show that unobserved confounding leaves a characteristic footprint in the observed data distribution that allows for disentangling spurious and causal effects.
We propose an adjusted score-based causal discovery algorithm that may be implemented with general-purpose solvers and scales to high-dimensional problems.
arXiv Detail & Related papers (2021-03-28T11:07:59Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - Adversarial robust weighted Huber regression [2.0257616108612373]
We consider a robust estimation of linear regression coefficients.
We derive an estimation error bound, which depends on the stable rank and the condition number of the covariance matrix.
arXiv Detail & Related papers (2021-02-22T15:50:34Z) - Robust regression with covariate filtering: Heavy tails and adversarial
contamination [6.939768185086755]
We show how to modify the Huber regression, least trimmed squares, and least absolute deviation estimators to obtain estimators simultaneously computationally and statistically efficient in the stronger contamination model.
We show that the Huber regression estimator achieves near-optimal error rates in this setting, whereas the least trimmed squares and least absolute deviation estimators can be made to achieve near-optimal error after applying a postprocessing step.
arXiv Detail & Related papers (2020-09-27T22:48:48Z) - Estimating Gradients for Discrete Random Variables by Sampling without
Replacement [93.09326095997336]
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement.
We show that our estimator can be derived as the Rao-Blackwellization of three different estimators.
arXiv Detail & Related papers (2020-02-14T14:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.