Robust High-dimensional Tuning Free Multiple Testing
- URL: http://arxiv.org/abs/2211.11959v2
- Date: Wed, 23 Nov 2022 18:20:18 GMT
- Title: Robust High-dimensional Tuning Free Multiple Testing
- Authors: Jianqing Fan, Zhipeng Lou, Mengxin Yu
- Abstract summary: This paper revisits the celebrated Hodges-Lehmann (HL) estimator for estimating location parameters in both the one- and two-sample problems.
We develop Berry-Esseen inequality and Cram'er type moderate deviation for the HL estimator based on newly developed non-asymptotic Bahadur representation.
It is convincingly shown that the resulting tuning-free and moment-free methods control false discovery proportion at a prescribed level.
- Score: 0.49416305961918056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A stylized feature of high-dimensional data is that many variables have heavy
tails, and robust statistical inference is critical for valid large-scale
statistical inference. Yet, the existing developments such as Winsorization,
Huberization and median of means require the bounded second moments and involve
variable-dependent tuning parameters, which hamper their fidelity in
applications to large-scale problems. To liberate these constraints, this paper
revisits the celebrated Hodges-Lehmann (HL) estimator for estimating location
parameters in both the one- and two-sample problems, from a non-asymptotic
perspective. Our study develops Berry-Esseen inequality and Cram\'{e}r type
moderate deviation for the HL estimator based on newly developed non-asymptotic
Bahadur representation, and builds data-driven confidence intervals via a
weighted bootstrap approach. These results allow us to extend the HL estimator
to large-scale studies and propose \emph{tuning-free} and \emph{moment-free}
high-dimensional inference procedures for testing global null and for
large-scale multiple testing with false discovery proportion control. It is
convincingly shown that the resulting tuning-free and moment-free methods
control false discovery proportion at a prescribed level. The simulation
studies lend further support to our developed theory.
Related papers
- Estimation and Inference for Causal Functions with Multiway Clustered Data [6.988496457312806]
This paper proposes methods of estimation and uniform inference for a general class of causal functions.
The causal function is identified as a conditional expectation of an adjusted (Neyman-orthogonal) signal.
We apply the proposed methods to analyze the causal relationship between levels in Africa and the historical slave trade.
arXiv Detail & Related papers (2024-09-10T17:17:53Z) - A sparse PAC-Bayesian approach for high-dimensional quantile prediction [0.0]
This paper presents a novel probabilistic machine learning approach for high-dimensional quantile prediction.
It uses a pseudo-Bayesian framework with a scaled Student-t prior and Langevin Monte Carlo for efficient computation.
Its effectiveness is validated through simulations and real-world data, where it performs competitively against established frequentist and Bayesian techniques.
arXiv Detail & Related papers (2024-09-03T08:01:01Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Online Bootstrap Inference with Nonconvex Stochastic Gradient Descent
Estimator [0.0]
In this paper, we investigate the theoretical properties of gradient descent (SGD) for statistical inference in the context of convex problems.
We propose two coferential procedures which may contain multiple error minima.
arXiv Detail & Related papers (2023-06-03T22:08:10Z) - Convergence of uncertainty estimates in Ensemble and Bayesian sparse
model discovery [4.446017969073817]
We show empirical success in terms of accuracy and robustness to noise with bootstrapping-based sequential thresholding least-squares estimator.
We show that this bootstrapping-based ensembling technique can perform a provably correct variable selection procedure with an exponential convergence rate of the error rate.
arXiv Detail & Related papers (2023-01-30T04:07:59Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Divergence Frontiers for Generative Models: Sample Complexity,
Quantization Level, and Frontier Integral [58.434753643798224]
Divergence frontiers have been proposed as an evaluation framework for generative models.
We establish non-asymptotic bounds on the sample complexity of the plug-in estimator of divergence frontiers.
We also augment the divergence frontier framework by investigating the statistical performance of smoothed distribution estimators.
arXiv Detail & Related papers (2021-06-15T06:26:25Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z) - Support estimation in high-dimensional heteroscedastic mean regression [2.28438857884398]
We consider a linear mean regression model with random design and potentially heteroscedastic, heavy-tailed errors.
We use a strictly convex, smooth variant of the Huber loss function with tuning parameter depending on the parameters of the problem.
For the resulting estimator we show sign-consistency and optimal rates of convergence in the $ell_infty$ norm.
arXiv Detail & Related papers (2020-11-03T09:46:31Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.