On Ranking-based Tests of Independence
- URL: http://arxiv.org/abs/2403.07464v1
- Date: Tue, 12 Mar 2024 10:00:00 GMT
- Title: On Ranking-based Tests of Independence
- Authors: Myrto Limnios (UCPH), St\'ephan Cl\'emen\c{c}on (LTCI, IDS, S2A, IP
Paris)
- Abstract summary: We develop a novel nonparametric framework to test the independence of two random variables $mathbfX$ and $mathbfY$.
We consider a wide class of rank statistics encompassing many ways of deviating from the diagonal in the ROC space to build tests of independence.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we develop a novel nonparametric framework to test the
independence of two random variables $\mathbf{X}$ and $\mathbf{Y}$ with unknown
respective marginals $H(dx)$ and $G(dy)$ and joint distribution $F(dx dy)$,
based on {\it Receiver Operating Characteristic} (ROC) analysis and bipartite
ranking. The rationale behind our approach relies on the fact that, the
independence hypothesis $\mathcal{H}\_0$ is necessarily false as soon as the
optimal scoring function related to the pair of distributions $(H\otimes G,\;
F)$, obtained from a bipartite ranking algorithm, has a ROC curve that deviates
from the main diagonal of the unit square.We consider a wide class of rank
statistics encompassing many ways of deviating from the diagonal in the ROC
space to build tests of independence. Beyond its great flexibility, this new
method has theoretical properties that far surpass those of its competitors.
Nonasymptotic bounds for the two types of testing errors are established. From
an empirical perspective, the novel procedure we promote in this paper exhibits
a remarkable ability to detect small departures, of various types, from the
null assumption $\mathcal{H}_0$, even in high dimension, as supported by the
numerical experiments presented here.
Related papers
- Detection of Correlated Random Vectors [7.320365821066746]
Two random vectors $mathsfXinmathbbRn$ and $mathsfYinmathbbRn$ are correlated or not.
We analyze the thresholds at which optimal testing is information-theoretically impossible and possible.
arXiv Detail & Related papers (2024-01-24T12:58:08Z) - Testing Dependency of Unlabeled Databases [5.384630221560811]
Two random databases $mathsfXinmathcalXntimes d$ and $mathsfYinmathcalYntimes d$ are statistically dependent or not.
We characterize the thresholds at which optimal testing is information-theoretically impossible and possible.
arXiv Detail & Related papers (2023-11-10T05:17:03Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - Testing distributional assumptions of learning algorithms [5.204779946147061]
We study the design of tester-learner pairs $(mathcalA,mathcalT)$.
We show that if the distribution on examples in the data passes the tester $mathcalT$ then one can safely trust the output of the agnostic $mathcalA$ on the data.
arXiv Detail & Related papers (2022-04-14T19:10:53Z) - High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad
Stepsize [55.0090961425708]
We propose a new, simplified high probability analysis of AdaGrad for smooth, non- probability problems.
We present our analysis in a modular way and obtain a complementary $mathcal O (1 / TT)$ convergence rate in the deterministic setting.
To the best of our knowledge, this is the first high probability for AdaGrad with a truly adaptive scheme, i.e., completely oblivious to the knowledge of smoothness.
arXiv Detail & Related papers (2022-04-06T13:50:33Z) - An $\ell^p$-based Kernel Conditional Independence Test [21.689461247198388]
We propose a new computationally efficient test for conditional independence based on the $Lp$ distance between two kernel-based representatives of well suited distributions.
We conduct a series of experiments showing that the performance of our new tests outperforms state-of-the-art methods both in term of statistical power and type-I error even in the high dimensional setting.
arXiv Detail & Related papers (2021-10-28T03:18:27Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z) - Optimal rates for independence testing via $U$-statistic permutation
tests [7.090165638014331]
We study the problem of independence testing given independent and identically distributed pairs taking values in a $sigma$-finite, separable measure space.
We first show that there is no valid test of independence that is uniformly consistent against alternatives of the form $f: D(f) geq rho2 $.
arXiv Detail & Related papers (2020-01-15T19:04:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.