Bootstrapped Edge Count Tests for Nonparametric Two-Sample Inference
Under Heterogeneity
- URL: http://arxiv.org/abs/2304.13848v1
- Date: Wed, 26 Apr 2023 22:25:44 GMT
- Title: Bootstrapped Edge Count Tests for Nonparametric Two-Sample Inference
Under Heterogeneity
- Authors: Trambak Banerjee, Bhaswar B. Bhattacharya, Gourab Mukherjee
- Abstract summary: We develop a new nonparametric testing procedure that accurately detects differences between the two samples.
A comprehensive simulation study and an application to detecting user behaviors in online games demonstrates the excellent non-asymptotic performance of the proposed test.
- Score: 5.8010446129208155
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nonparametric two-sample testing is a classical problem in inferential
statistics. While modern two-sample tests, such as the edge count test and its
variants, can handle multivariate and non-Euclidean data, contemporary
gargantuan datasets often exhibit heterogeneity due to the presence of latent
subpopulations. Direct application of these tests, without regulating for such
heterogeneity, may lead to incorrect statistical decisions. We develop a new
nonparametric testing procedure that accurately detects differences between the
two samples in the presence of unknown heterogeneity in the data generation
process. Our framework handles this latent heterogeneity through a composite
null that entertains the possibility that the two samples arise from a mixture
distribution with identical component distributions but with possibly different
mixing weights. In this regime, we study the asymptotic behavior of weighted
edge count test statistic and show that it can be effectively re-calibrated to
detect arbitrary deviations from the composite null. For practical
implementation we propose a Bootstrapped Weighted Edge Count test which
involves a bootstrap-based calibration procedure that can be easily implemented
across a wide range of heterogeneous regimes. A comprehensive simulation study
and an application to detecting aberrant user behaviors in online games
demonstrates the excellent non-asymptotic performance of the proposed test.
Related papers
- A Kernel-Based Conditional Two-Sample Test Using Nearest Neighbors (with Applications to Calibration, Regression Curves, and Simulation-Based Inference) [3.622435665395788]
We introduce a kernel-based measure for detecting differences between two conditional distributions.
When the two conditional distributions are the same, the estimate has a Gaussian limit and its variance has a simple form that can be easily estimated from the data.
We also provide a resampling based test using our estimate that applies to the conditional goodness-of-fit problem.
arXiv Detail & Related papers (2024-07-23T15:04:38Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Active Sequential Two-Sample Testing [18.99517340397671]
We consider the two-sample testing problem in a new scenario where sample measurements are inexpensive to access.
We devise the first emphactiveNIST-sample testing framework that not only sequentially but also emphactively queries.
In practice, we introduce an instantiation of our framework and evaluate it using several experiments.
arXiv Detail & Related papers (2023-01-30T02:23:49Z) - Statistical and Computational Phase Transitions in Group Testing [73.55361918807883]
We study the group testing problem where the goal is to identify a set of k infected individuals carrying a rare disease.
We consider two different simple random procedures for assigning individuals tests.
arXiv Detail & Related papers (2022-06-15T16:38:50Z) - Nonparametric Conditional Local Independence Testing [69.31200003384122]
Conditional local independence is an independence relation among continuous time processes.
No nonparametric test of conditional local independence has been available.
We propose such a nonparametric test based on double machine learning.
arXiv Detail & Related papers (2022-03-25T10:31:02Z) - Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated
Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample.
We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z) - Nonparametric Empirical Bayes Estimation and Testing for Sparse and
Heteroscedastic Signals [5.715675926089834]
Large-scale modern data often involves estimation and testing for high-dimensional unknown parameters.
It is desirable to identify the sparse signals, the needles in the haystack'', with accuracy and false discovery control.
We propose a novel Spike-and-Nonparametric mixture prior (SNP) -- a spike to promote the sparsity and a nonparametric structure to capture signals.
arXiv Detail & Related papers (2021-06-16T15:55:44Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Adaptive, Rate-Optimal Hypothesis Testing in Nonparametric IV Models [2.07706336594149]
We propose a new adaptive hypothesis test for inequality (e.g., monotonicity, convexity) and equality (e.g., parametric, semiparametric) restrictions on a structural function in a nonparametric instrumental variables (NPIV) model.
Our test adapts to the unknown smoothness of alternative functions in the presence of unknown degree of endogeneity and unknown strength of the instruments.
arXiv Detail & Related papers (2020-06-17T01:19:13Z) - Distributed, partially collapsed MCMC for Bayesian Nonparametrics [68.5279360794418]
We exploit the fact that completely random measures, which commonly used models like the Dirichlet process and the beta-Bernoulli process can be expressed as, are decomposable into independent sub-measures.
We use this decomposition to partition the latent measure into a finite measure containing only instantiated components, and an infinite measure containing all other components.
The resulting hybrid algorithm can be applied to allow scalable inference without sacrificing convergence guarantees.
arXiv Detail & Related papers (2020-01-15T23:10:13Z) - Asymptotic Validity and Finite-Sample Properties of Approximate Randomization Tests [2.28438857884398]
Our key theoretical contribution is a non-asymptotic bound on the discrepancy between the size of an approximate randomization test and the size of the original randomization test using noiseless data.
We illustrate our theory through several examples, including tests of significance in linear regression.
arXiv Detail & Related papers (2019-08-12T16:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.