Anchor-based Maximum Discrepancy for Relative Similarity Testing
- URL: http://arxiv.org/abs/2510.10477v1
- Date: Sun, 12 Oct 2025 07:03:49 GMT
- Title: Anchor-based Maximum Discrepancy for Relative Similarity Testing
- Authors: Zhijian Zhou, Liuhua Peng, Xunye Tian, Feng Liu,
- Abstract summary: Relative similarity testing aims to determine which of the distributions, P or Q, is closer to an anchor distribution U.<n>Existing kernel-based approaches often test the relative similarity with a fixed kernel in a manually specified alternative hypothesis.<n>We propose an anchor-based maximum discrepancy (AMD) which defines the relative similarity as the maximum discrepancy between the distances of (U, P) and (U, Q) in a space of deep kernels.
- Score: 6.903548360333296
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The relative similarity testing aims to determine which of the distributions, P or Q, is closer to an anchor distribution U. Existing kernel-based approaches often test the relative similarity with a fixed kernel in a manually specified alternative hypothesis, e.g., Q is closer to U than P. Although kernel selection is known to be important to kernel-based testing methods, the manually specified hypothesis poses a significant challenge for kernel selection in relative similarity testing: Once the hypothesis is specified first, we can always find a kernel such that the hypothesis is rejected. This challenge makes relative similarity testing ill-defined when we want to select a good kernel after the hypothesis is specified. In this paper, we cope with this challenge via learning a proper hypothesis and a kernel simultaneously, instead of learning a kernel after manually specifying the hypothesis. We propose an anchor-based maximum discrepancy (AMD), which defines the relative similarity as the maximum discrepancy between the distances of (U, P) and (U, Q) in a space of deep kernels. Based on AMD, our testing incorporates two phases. In Phase I, we estimate the AMD over the deep kernel space and infer the potential hypothesis. In Phase II, we assess the statistical significance of the potential hypothesis, where we propose a unified testing framework to derive thresholds for tests over different possible hypotheses from Phase I. Lastly, we validate our method theoretically and demonstrate its effectiveness via extensive experiments on benchmark datasets. Codes are publicly available at: https://github.com/zhijianzhouml/AMD.
Related papers
- Regularized $f$-Divergence Kernel Tests [24.182732872327183]
We propose a framework to construct practical kernel-based two-sample tests from the family of $f$-divergences.<n>We provide theoretical guarantees for statistical test power across our family of $f$-divergence estimates.<n>For machine unlearning, we propose a relative test that distinguishes true unlearning failures from safe distributional variations.
arXiv Detail & Related papers (2026-01-27T16:15:48Z) - On Robust hypothesis testing with respect to Hellinger distance [0.0]
We study the hypothesis testing problem where the observed samples need not come from either of the specified hypotheses.<n>In such a situation, we would like our test to be robust to this misspecification and output the distribution closer in Hellinger distance.<n>Our main result is quantifying how close the underlying distribution has to be to either of the hypotheses.
arXiv Detail & Related papers (2025-10-19T08:20:43Z) - A Scalable Nyström-Based Kernel Two-Sample Test with Permutations [9.849635250118912]
Two-sample hypothesis testing is a fundamental problem in statistics and machine learning.<n>In this work, we use a Nystr"om approximation of the maximum mean discrepancy (MMD) to design a computationally efficient and practical testing algorithm.
arXiv Detail & Related papers (2025-02-19T09:22:48Z) - Sequential Predictive Two-Sample and Independence Testing [114.4130718687858]
We study the problems of sequential nonparametric two-sample and independence testing.
We build upon the principle of (nonparametric) testing by betting.
arXiv Detail & Related papers (2023-04-29T01:30:33Z) - Targeted Separation and Convergence with Kernel Discrepancies [61.973643031360254]
kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or (ii) control weak convergence to P.<n>In this article we derive new sufficient and necessary conditions to ensure (i) and (ii)<n>For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels.
arXiv Detail & Related papers (2022-09-26T16:41:16Z) - Kernel Robust Hypothesis Testing [20.78285964841612]
In this paper, uncertainty sets are constructed in a data-driven manner using kernel method.
The goal is to design a test that performs well under the worst-case distributions over the uncertainty sets.
For the Neyman-Pearson setting, the goal is to minimize the worst-case probability of miss detection subject to a constraint on the worst-case probability of false alarm.
arXiv Detail & Related papers (2022-03-23T23:59:03Z) - Meta-Learning Hypothesis Spaces for Sequential Decision-making [79.73213540203389]
We propose to meta-learn a kernel from offline data (Meta-KeL)
Under mild conditions, we guarantee that our estimated RKHS yields valid confidence sets.
We also empirically evaluate the effectiveness of our approach on a Bayesian optimization task.
arXiv Detail & Related papers (2022-02-01T17:46:51Z) - Maximum Mean Discrepancy Test is Aware of Adversarial Attacks [122.51040127438324]
The maximum mean discrepancy (MMD) test could in principle detect any distributional discrepancy between two datasets.
It has been shown that the MMD test is unaware of adversarial attacks.
arXiv Detail & Related papers (2020-10-22T03:42:12Z) - Isolation Distributional Kernel: A New Tool for Point & Group Anomaly
Detection [76.1522587605852]
Isolation Distributional Kernel (IDK) is a new way to measure the similarity between two distributions.
We demonstrate IDK's efficacy and efficiency as a new tool for kernel based anomaly detection for both point and group anomalies.
arXiv Detail & Related papers (2020-09-24T12:25:43Z) - Learning Deep Kernels for Non-Parametric Two-Sample Tests [50.92621794426821]
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution.
Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test power.
arXiv Detail & Related papers (2020-02-21T03:54:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.