Optimal Bias-Correction and Valid Inference in High-Dimensional Ridge Regression: A Closed-Form Solution
- URL: http://arxiv.org/abs/2405.00424v2
- Date: Wed, 24 Jul 2024 15:59:56 GMT
- Title: Optimal Bias-Correction and Valid Inference in High-Dimensional Ridge Regression: A Closed-Form Solution
- Authors: Zhaoxing Gao, Ruey S. Tsay,
- Abstract summary: We introduce an iterative strategy to correct bias effectively when the dimension $p$ is less than the sample size $n$.
For $p>n$, our method optimally mitigates the bias such that any remaining bias in the proposed de-biased estimator is unattainable.
Our method offers a transformative solution to the bias challenge in ridge regression inferences across various disciplines.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ridge regression is an indispensable tool in big data analysis. Yet its inherent bias poses a significant and longstanding challenge, compromising both statistical efficiency and scalability across various applications. To tackle this critical issue, we introduce an iterative strategy to correct bias effectively when the dimension $p$ is less than the sample size $n$. For $p>n$, our method optimally mitigates the bias such that any remaining bias in the proposed de-biased estimator is unattainable through linear transformations of the response data. To address the remaining bias when $p>n$, we employ a Ridge-Screening (RS) method, producing a reduced model suitable for bias correction. Crucially, under certain conditions, the true model is nested within our selected one, highlighting RS as a novel variable selection approach. Through rigorous analysis, we establish the asymptotic properties and valid inferences of our de-biased ridge estimators for both $p<n$ and $p>n$, where, both $p$ and $n$ may increase towards infinity, along with the number of iterations. We further validate these results using simulated and real-world data examples. Our method offers a transformative solution to the bias challenge in ridge regression inferences across various disciplines.
Related papers
- Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives.
Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion.
We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z) - Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup.
We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$.
Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z) - Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - The Adaptive $τ$-Lasso: Robustness and Oracle Properties [12.06248959194646]
This paper introduces a new regularized version of the robust $tau$-regression estimator for analyzing high-dimensional datasets.
The resulting estimator, termed adaptive $tau$-Lasso, is robust to outliers and high-leverage points.
In the face of outliers and high-leverage points, the adaptive $tau$-Lasso and $tau$-Lasso estimators achieve the best performance or close-to-best performance.
arXiv Detail & Related papers (2023-04-18T21:34:14Z) - Retire: Robust Expectile Regression in High Dimensions [3.9391041278203978]
Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data.
We propose and study (penalized) robust expectile regression (retire)
We show that the proposed procedure can be efficiently solved by a semismooth Newton coordinate descent algorithm.
arXiv Detail & Related papers (2022-12-11T18:03:12Z) - Streaming Sparse Linear Regression [1.8707139489039097]
We propose a novel online sparse linear regression framework for analyzing streaming data when data points arrive sequentially.
Our proposed method is memory efficient and requires less stringent restricted strong convexity assumptions.
arXiv Detail & Related papers (2022-11-11T07:31:55Z) - Bias Mimicking: A Simple Sampling Approach for Bias Mitigation [57.17709477668213]
We introduce a new class-conditioned sampling method: Bias Mimicking.
Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3% over four benchmarks.
arXiv Detail & Related papers (2022-09-30T17:33:00Z) - A Conditional Randomization Test for Sparse Logistic Regression in
High-Dimension [36.00360315353985]
emphCRT-logit is an algorithm that combines a variable-distillation step and a decorrelation step.
We provide a theoretical analysis of this procedure, and demonstrate its effectiveness on simulations, along with experiments on large-scale brain-imaging and genomics datasets.
arXiv Detail & Related papers (2022-05-29T09:37:16Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - Adiabatic Quantum Feature Selection for Sparse Linear Regression [0.17499351967216337]
We formulate and compare the quality of QUBO solution on synthetic and real world datasets.
The results demonstrate the effectiveness of the proposed adiabatic quantum computing approach in finding the optimal solution.
arXiv Detail & Related papers (2021-06-04T09:14:01Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.