Double Generative Adversarial Networks for Conditional Independence
Testing
- URL: http://arxiv.org/abs/2006.02615v3
- Date: Thu, 4 Nov 2021 23:11:27 GMT
- Title: Double Generative Adversarial Networks for Conditional Independence
Testing
- Authors: Chengchun Shi and Tianlin Xu and Wicher Bergsma and Lexin Li
- Abstract summary: High-dimensional conditional independence testing is a key building block in statistics and machine learning.
We propose an inferential procedure based on double generative adversarial networks (GANs)
- Score: 8.359770027722275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this article, we study the problem of high-dimensional conditional
independence testing, a key building block in statistics and machine learning.
We propose an inferential procedure based on double generative adversarial
networks (GANs). Specifically, we first introduce a double GANs framework to
learn two generators of the conditional distributions. We then integrate the
two generators to construct a test statistic, which takes the form of the
maximum of generalized covariance measures of multiple transformation
functions. We also employ data-splitting and cross-fitting to minimize the
conditions on the generators to achieve the desired asymptotic properties, and
employ multiplier bootstrap to obtain the corresponding $p$-value. We show that
the constructed test statistic is doubly robust, and the resulting test both
controls type-I error and has the power approaching one asymptotically. Also
notably, we establish those theoretical guarantees under much weaker and
practically more feasible conditions compared to the existing tests, and our
proposal gives a concrete example of how to utilize some state-of-the-art deep
learning tools, such as GANs, to help address a classical but challenging
statistical problem. We demonstrate the efficacy of our test through both
simulations and an application to an anti-cancer drug dataset. A Python
implementation of the proposed procedure is available at
https://github.com/tianlinxu312/dgcit.
Related papers
- Doubly Robust Conditional Independence Testing with Generative Neural Networks [8.323172773256449]
This article addresses the problem of testing the conditional independence of two generic random vectors $X$ and $Y$ given a third random vector $Z$.
We propose a new non-parametric testing procedure that avoids explicitly estimating any conditional distributions.
arXiv Detail & Related papers (2024-07-25T01:28:59Z) - A Kernel-Based Conditional Two-Sample Test Using Nearest Neighbors (with Applications to Calibration, Regression Curves, and Simulation-Based Inference) [3.622435665395788]
We introduce a kernel-based measure for detecting differences between two conditional distributions.
When the two conditional distributions are the same, the estimate has a Gaussian limit and its variance has a simple form that can be easily estimated from the data.
We also provide a resampling based test using our estimate that applies to the conditional goodness-of-fit problem.
arXiv Detail & Related papers (2024-07-23T15:04:38Z) - Collaborative non-parametric two-sample testing [55.98760097296213]
The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected.
We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure.
Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning.
arXiv Detail & Related papers (2024-02-08T14:43:56Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Deep anytime-valid hypothesis testing [29.273915933729057]
We propose a general framework for constructing powerful, sequential hypothesis tests for nonparametric testing problems.
We develop a principled approach of leveraging the representation capability of machine learning models within the testing-by-betting framework.
Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines.
arXiv Detail & Related papers (2023-10-30T09:46:19Z) - Active Sequential Two-Sample Testing [18.99517340397671]
We consider the two-sample testing problem in a new scenario where sample measurements are inexpensive to access.
We devise the first emphactiveNIST-sample testing framework that not only sequentially but also emphactively queries.
In practice, we introduce an instantiation of our framework and evaluate it using several experiments.
arXiv Detail & Related papers (2023-01-30T02:23:49Z) - Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures.
We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - Significance tests of feature relevance for a blackbox learner [6.72450543613463]
We derive two consistent tests for the feature relevance of a blackbox learner.
The first evaluates a loss difference with perturbation on an inference sample.
The second splits the inference sample into two but does not require data perturbation.
arXiv Detail & Related papers (2021-03-02T00:59:19Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.