On the Hardness of Conditional Independence Testing In Practice
- URL: http://arxiv.org/abs/2512.14000v1
- Date: Tue, 16 Dec 2025 01:45:23 GMT
- Title: On the Hardness of Conditional Independence Testing In Practice
- Authors: Zheng He, Roman Pogodin, Yazhe Li, Namrata Deka, Arthur Gretton, Danica J. Sutherland,
- Abstract summary: Tests of conditional independence (CI) underpin a number of important problems in machine learning and statistics.<n>Shah and Peters ( 2020) showed that, contrary to the unconditional case, no universally finite-sample valid test can ever achieve nontrivial power.<n>We investigate the Kernel-based Conditional Independence (KCI) test and identify the major factors underlying its practical behavior.
- Score: 33.26934394515333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tests of conditional independence (CI) underpin a number of important problems in machine learning and statistics, from causal discovery to evaluation of predictor fairness and out-of-distribution robustness. Shah and Peters (2020) showed that, contrary to the unconditional case, no universally finite-sample valid test can ever achieve nontrivial power. While informative, this result (based on "hiding" dependence) does not seem to explain the frequent practical failures observed with popular CI tests. We investigate the Kernel-based Conditional Independence (KCI) test - of which we show the Generalized Covariance Measure underlying many recent tests is nearly a special case - and identify the major factors underlying its practical behavior. We highlight the key role of errors in the conditional mean embedding estimate for the Type-I error, while pointing out the importance of selecting an appropriate conditioning kernel (not recognized in previous work) as being necessary for good test power but also tending to inflate Type-I error.
Related papers
- Toward Scalable and Valid Conditional Independence Testing with Spectral Representations [25.258360465513338]
Conditional independence (CI) is untestable in many settings without additional assumptions.<n>We introduce a practical bi-level contrastive algorithm to learn representations derived from the singular value decomposition of the partial covariance operator.<n>Preliminary experiments suggest that this approach offers a practical and statistically grounded path toward scalable CI testing.
arXiv Detail & Related papers (2025-12-22T16:05:18Z) - COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z) - A Sample Efficient Conditional Independence Test in the Presence of Discretization [54.047334792855345]
Conditional Independence (CI) tests directly to discretized data can lead to incorrect conclusions.<n>Recent advancements have sought to infer the correct CI relationship between the latent variables through binarizing observed data.<n>Motivated by this, this paper introduces a sample-efficient CI test that does not rely on the binarization process.
arXiv Detail & Related papers (2025-06-10T12:41:26Z) - Internal Incoherency Scores for Constraint-based Causal Discovery Algorithms [12.524536193679124]
We propose internal coherency scores that allow testing for assumption violations and finite sample errors.<n>We illustrate our coherency scores on the PC algorithm with simulated and real-world datasets.
arXiv Detail & Related papers (2025-02-20T16:44:54Z) - Practical Kernel Tests of Conditional Independence [33.275712245547815]
SplitKCI is an automated method for bias control for the Kernel-based Conditional Independence test based on data splitting.<n>We show that our approach significantly improves test level control for KCI without sacrificing test power.
arXiv Detail & Related papers (2024-02-20T18:07:59Z) - Precise Error Rates for Computationally Efficient Testing [67.30044609837749]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.<n>An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Conditional independence testing under misspecified inductive biases [27.34558936393097]
We study the performance of regression-based CI tests under misspecified inductive biases.
Namely, we propose new approximations or upper bounds for the testing errors of three regression-based tests.
We introduce the Rao-Blackwellized Predictor Test (RBPT), a regression-based CI test robust against misspecified inductive biases.
arXiv Detail & Related papers (2023-07-05T17:53:13Z) - Testing for Overfitting [0.0]
We discuss the overfitting problem and explain why standard and concentration results do not hold for evaluation with training data.<n>We introduce and argue for a hypothesis test by means of which both model performance may be evaluated using training data.
arXiv Detail & Related papers (2023-05-09T22:49:55Z) - Sequential Kernelized Independence Testing [77.237958592189]
We design sequential kernelized independence tests inspired by kernelized dependence measures.<n>We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - With Little Power Comes Great Responsibility [54.96675741328462]
Underpowered experiments make it more difficult to discern the difference between statistical noise and meaningful model improvements.
Small test sets mean that most attempted comparisons to state of the art models will not be adequately powered.
For machine translation, we find that typical test sets of 2000 sentences have approximately 75% power to detect differences of 1 BLEU point.
arXiv Detail & Related papers (2020-10-13T18:00:02Z) - Cross-validation Confidence Intervals for Test Error [83.67415139421448]
This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm.
Results are the first of their kind for the popular choice of leave-one-out cross-validation.
arXiv Detail & Related papers (2020-07-24T17:40:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.