Nearest-Neighbor Sampling Based Conditional Independence Testing
- URL: http://arxiv.org/abs/2304.04183v1
- Date: Sun, 9 Apr 2023 07:54:36 GMT
- Title: Nearest-Neighbor Sampling Based Conditional Independence Testing
- Authors: Shuai Li, Ziqi Chen, Hongtu Zhu, Christina Dan Wang, Wang Wen
- Abstract summary: Conditional randomization test (CRT) was recently proposed to test whether two random variables X and Y are conditionally independent given random variables Z.
The aim of this paper is to develop a novel alternative of CRT by using nearest-neighbor sampling without assuming the exact form of the distribution of X given Z.
- Score: 15.478671471695794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The conditional randomization test (CRT) was recently proposed to test
whether two random variables X and Y are conditionally independent given random
variables Z. The CRT assumes that the conditional distribution of X given Z is
known under the null hypothesis and then it is compared to the distribution of
the observed samples of the original data. The aim of this paper is to develop
a novel alternative of CRT by using nearest-neighbor sampling without assuming
the exact form of the distribution of X given Z. Specifically, we utilize the
computationally efficient 1-nearest-neighbor to approximate the conditional
distribution that encodes the null hypothesis. Then, theoretically, we show
that the distribution of the generated samples is very close to the true
conditional distribution in terms of total variation distance. Furthermore, we
take the classifier-based conditional mutual information estimator as our test
statistic. The test statistic as an empirical fundamental information theoretic
quantity is able to well capture the conditional-dependence feature. We show
that our proposed test is computationally very fast, while controlling type I
and II errors quite well. Finally, we demonstrate the efficiency of our
proposed test in both synthetic and real data analyses.
Related papers
- Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers.
We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.
This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - Controllable Generation via Locally Constrained Resampling [77.48624621592523]
We propose a tractable probabilistic approach that performs Bayesian conditioning to draw samples subject to a constraint.
Our approach considers the entire sequence, leading to a more globally optimal constrained generation than current greedy methods.
We show that our approach is able to steer the model's outputs away from toxic generations, outperforming similar approaches to detoxification.
arXiv Detail & Related papers (2024-10-17T00:49:53Z) - Doubly Robust Conditional Independence Testing with Generative Neural Networks [8.323172773256449]
This article addresses the problem of testing the conditional independence of two generic random vectors $X$ and $Y$ given a third random vector $Z$.
We propose a new non-parametric testing procedure that avoids explicitly estimating any conditional distributions.
arXiv Detail & Related papers (2024-07-25T01:28:59Z) - A Kernel-Based Conditional Two-Sample Test Using Nearest Neighbors (with Applications to Calibration, Regression Curves, and Simulation-Based Inference) [3.622435665395788]
We introduce a kernel-based measure for detecting differences between two conditional distributions.
When the two conditional distributions are the same, the estimate has a Gaussian limit and its variance has a simple form that can be easily estimated from the data.
We also provide a resampling based test using our estimate that applies to the conditional goodness-of-fit problem.
arXiv Detail & Related papers (2024-07-23T15:04:38Z) - Sequential Predictive Two-Sample and Independence Testing [114.4130718687858]
We study the problems of sequential nonparametric two-sample and independence testing.
We build upon the principle of (nonparametric) testing by betting.
arXiv Detail & Related papers (2023-04-29T01:30:33Z) - DensePure: Understanding Diffusion Models towards Adversarial Robustness [110.84015494617528]
We analyze the properties of diffusion models and establish the conditions under which they can enhance certified robustness.
We propose a new method DensePure, designed to improve the certified robustness of a pretrained model (i.e. a classifier)
We show that this robust region is a union of multiple convex sets, and is potentially much larger than the robust regions identified in previous works.
arXiv Detail & Related papers (2022-11-01T08:18:07Z) - Nonparametric Conditional Local Independence Testing [69.31200003384122]
Conditional local independence is an independence relation among continuous time processes.
No nonparametric test of conditional local independence has been available.
We propose such a nonparametric test based on double machine learning.
arXiv Detail & Related papers (2022-03-25T10:31:02Z) - Wasserstein Generative Learning of Conditional Distribution [6.051520664893158]
We propose a Wasserstein generative approach to learning a conditional distribution.
We establish non-asymptotic error bound of the conditional sampling distribution generated by the proposed method.
arXiv Detail & Related papers (2021-12-19T01:55:01Z) - Adversarial sampling of unknown and high-dimensional conditional
distributions [0.0]
In this paper the sampling method, as well as the inference of the underlying distribution, are handled with a data-driven method known as generative adversarial networks (GAN)
GAN trains two competing neural networks to produce a network that can effectively generate samples from the training set distribution.
It is shown that all the versions of the proposed algorithm effectively sample the target conditional distribution with minimal impact on the quality of the samples.
arXiv Detail & Related papers (2021-11-08T12:23:38Z) - Two-Sample Testing on Ranked Preference Data and the Role of Modeling
Assumptions [57.77347280992548]
In this paper, we design two-sample tests for pairwise comparison data and ranking data.
Our test requires essentially no assumptions on the distributions.
By applying our two-sample test on real-world pairwise comparison data, we conclude that ratings and rankings provided by people are indeed distributed differently.
arXiv Detail & Related papers (2020-06-21T20:51:09Z) - Testing Goodness of Fit of Conditional Density Models with Kernels [16.003516725803774]
We propose two nonparametric statistical tests of goodness of fit for conditional distributions.
We show that our tests are consistent against any fixed alternative conditional model.
We demonstrate the interpretability of our test on a task of modeling the distribution of New York City's taxi drop-off location.
arXiv Detail & Related papers (2020-02-24T14:04:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.