Multiple Testing of Linear Forms for Noisy Matrix Completion
- URL: http://arxiv.org/abs/2312.00305v1
- Date: Fri, 1 Dec 2023 02:53:20 GMT
- Title: Multiple Testing of Linear Forms for Noisy Matrix Completion
- Authors: Wanteng Ma, Lilun Du, Dong Xia and Ming Yuan
- Abstract summary: We develop a general approach to overcome difficulties by introducing new statistics for individual tests with sharp new statistics.
We show that valid FDR control can be achieved with guaranteed power under nearly optimal sample size requirements.
- Score: 14.496082411670677
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many important tasks of large-scale recommender systems can be naturally cast
as testing multiple linear forms for noisy matrix completion. These problems,
however, present unique challenges because of the subtle bias-and-variance
tradeoff of and an intricate dependence among the estimated entries induced by
the low-rank structure. In this paper, we develop a general approach to
overcome these difficulties by introducing new statistics for individual tests
with sharp asymptotics both marginally and jointly, and utilizing them to
control the false discovery rate (FDR) via a data splitting and symmetric
aggregation scheme. We show that valid FDR control can be achieved with
guaranteed power under nearly optimal sample size requirements using the
proposed methodology. Extensive numerical simulations and real data examples
are also presented to further illustrate its practical merits.
Related papers
- Sample Complexity of Linear Quadratic Regulator Without Initial Stability [11.98212766542468]
Inspired by REINFORCE, we introduce a novel receding-horizon algorithm for the Linear Quadratic Regulator (LQR) problem with unknown parameters.
Unlike prior methods, our algorithm avoids reliance on two-point gradient estimates while maintaining the same order of sample complexity.
arXiv Detail & Related papers (2025-02-20T02:44:25Z) - Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems.
We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques.
In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets.
Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Near-optimal multiple testing in Bayesian linear models with
finite-sample FDR control [11.011242089340438]
In high dimensional variable selection problems, statisticians often seek to design multiple testing procedures that control the False Discovery Rate (FDR)
We introduce Model-X procedures that provably control the frequentist FDR from finite samples, even when the model is misspecified.
Our proposed procedure, PoEdCe, incorporates three key ingredients: Posterior Expectation, distilled randomization test (dCRT), and the Benjamini-Hochberg procedure with e-values.
arXiv Detail & Related papers (2022-11-04T22:56:41Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Calibrating Over-Parametrized Simulation Models: A Framework via
Eligibility Set [3.862247454265944]
We develop a framework to develop calibration schemes that satisfy rigorous frequentist statistical guarantees.
We demonstrate our methodology on several numerical examples, including an application to calibration of a limit order book market simulator.
arXiv Detail & Related papers (2021-05-27T00:59:29Z) - Effective multi-view registration of point sets based on student's t
mixture model [15.441928157356477]
This paper proposes an effective registration method based on Student's t Mixture Model (StMM)
It is more efficient to achieve multi-view registration since all t-distribution centroids can be obtained by the NN search method.
Experimental results illustrate its superior performance and accuracy over state-of-the-art methods.
arXiv Detail & Related papers (2020-12-13T08:27:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.