Collaborative non-parametric two-sample testing
- URL: http://arxiv.org/abs/2402.05715v1
- Date: Thu, 8 Feb 2024 14:43:56 GMT
- Title: Collaborative non-parametric two-sample testing
- Authors: Alejandro de la Concha, Nicolas Vayatis, Argyris Kalogeratos
- Abstract summary: The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected.
We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure.
Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning.
- Score: 55.98760097296213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the multiple two-sample test problem in a
graph-structured setting, which is a common scenario in fields such as Spatial
Statistics and Neuroscience. Each node $v$ in fixed graph deals with a
two-sample testing problem between two node-specific probability density
functions (pdfs), $p_v$ and $q_v$. The goal is to identify nodes where the null
hypothesis $p_v = q_v$ should be rejected, under the assumption that connected
nodes would yield similar test outcomes. We propose the non-parametric
collaborative two-sample testing (CTST) framework that efficiently leverages
the graph structure and minimizes the assumptions over $p_v$ and $q_v$. Our
methodology integrates elements from f-divergence estimation, Kernel Methods,
and Multitask Learning. We use synthetic experiments and a real sensor network
detecting seismic activity to demonstrate that CTST outperforms
state-of-the-art non-parametric statistical tests that apply at each node
independently, hence disregard the geometry of the problem.
Related papers
- Doubly Robust Conditional Independence Testing with Generative Neural Networks [8.323172773256449]
This article addresses the problem of testing the conditional independence of two generic random vectors $X$ and $Y$ given a third random vector $Z$.
We propose a new non-parametric testing procedure that avoids explicitly estimating any conditional distributions.
arXiv Detail & Related papers (2024-07-25T01:28:59Z) - Network two-sample test for block models [16.597465729143813]
We consider the two-sample testing problem for networks, where the goal is to determine whether two sets of networks originated from the same model.
We adopt the block model (SBM) for network distributions, due to their interpretability and the potential to approximate more general models.
We introduce an efficient algorithm to match estimated network parameters, allowing us to properly combine and contrast information within and across samples, leading to a powerful test.
arXiv Detail & Related papers (2024-06-10T04:28:37Z) - Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent [83.85536329832722]
We show that gradient descent (SGD) can efficiently solve the $k$-parity problem on a $d$dimensional hypercube.
We then demonstrate how a trained neural network with SGD, solving the $k$-parity problem with small statistical errors.
arXiv Detail & Related papers (2024-04-18T17:57:53Z) - Kernel-Based Tests for Likelihood-Free Hypothesis Testing [21.143798051525646]
Given $n$ observations from two balanced classes, consider the task of labeling an additional $m$ inputs that are known to all belong to emphone of the two classes.
Special cases of this problem are well-known; when $m=1$ it corresponds to binary classification; and when $mapprox n$ it is equivalent to two-sample testing.
In recent work it was discovered that there is a fundamental trade-off between $m$ and $n$: increasing the data sample $m$ reduces the amount $n$ of training/simulation data needed.
arXiv Detail & Related papers (2023-08-17T15:24:03Z) - Collaborative likelihood-ratio estimation over graphs [55.98760097296213]
Graph-based Relative Unconstrained Least-squares Importance Fitting (GRULSIF)
We develop this idea in a concrete non-parametric method that we call Graph-based Relative Unconstrained Least-squares Importance Fitting (GRULSIF)
We derive convergence rates for our collaborative approach that highlights the role played by variables such as the number of available observations per node, the size of the graph, and how accurately the graph structure encodes the similarity between tasks.
arXiv Detail & Related papers (2022-05-28T15:37:03Z) - A Manifold Two-Sample Test Study: Integral Probability Metric with
Neural Networks [46.62713126719579]
Two-sample tests are important areas aiming to determine whether two collections of observations follow the same distribution or not.
We propose two-sample tests based on integral probability metric (IPM) for high-dimensional samples supported on a low-dimensional manifold.
Our proposed tests are adaptive to low-dimensional geometric structure because their performance crucially depends on the intrinsic dimension instead of the data dimension.
arXiv Detail & Related papers (2022-05-04T13:03:31Z) - Distributed Sparse Feature Selection in Communication-Restricted
Networks [6.9257380648471765]
We propose and theoretically analyze a new distributed scheme for sparse linear regression and feature selection.
In order to infer the causal dimensions from the whole dataset, we propose a simple, yet effective method for information sharing in the network.
arXiv Detail & Related papers (2021-11-02T05:02:24Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Learning Deep Kernels for Non-Parametric Two-Sample Tests [50.92621794426821]
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution.
Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test power.
arXiv Detail & Related papers (2020-02-21T03:54:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.