Signature Maximum Mean Discrepancy Two-Sample Statistical Tests
- URL: http://arxiv.org/abs/2506.01718v1
- Date: Mon, 02 Jun 2025 14:26:58 GMT
- Title: Signature Maximum Mean Discrepancy Two-Sample Statistical Tests
- Authors: Andrew Alden, Blanka Horvath, Zacharia Issa,
- Abstract summary: This work is dedicated to understanding the possibilities and challenges associated with applying the sig-MMD as a statistical tool in practice.<n>We introduce and explain the sig-MMD, and provide easily accessible and verifiable examples for its practical use.
- Score: 0.5461938536945723
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Maximum Mean Discrepancy (MMD) is a widely used concept in machine learning research which has gained popularity in recent years as a highly effective tool for comparing (finite-dimensional) distributions. Since it is designed as a kernel-based method, the MMD can be extended to path space valued distributions using the signature kernel. The resulting signature MMD (sig-MMD) can be used to define a metric between distributions on path space. Similarly to the original use case of the MMD as a test statistic within a two-sample testing framework, the sig-MMD can be applied to determine if two sets of paths are drawn from the same stochastic process. This work is dedicated to understanding the possibilities and challenges associated with applying the sig-MMD as a statistical tool in practice. We introduce and explain the sig-MMD, and provide easily accessible and verifiable examples for its practical use. We present examples that can lead to Type 2 errors in the hypothesis test, falsely indicating that samples have been drawn from the same underlying process (which generally occurs in a limited data setting). We then present techniques to mitigate the occurrence of this type of error.
Related papers
- An Efficient Permutation-Based Kernel Two-Sample Test [13.229867216847534]
Two-sample hypothesis testing is a fundamental problem in statistics and machine learning.<n>In this work, we use a Nystr"om approximation of the maximum mean discrepancy (MMD) to design a computationally efficient and practical testing algorithm.
arXiv Detail & Related papers (2025-02-19T09:22:48Z) - Partial identification of kernel based two sample tests with mismeasured
data [5.076419064097733]
Two-sample tests such as the Maximum Mean Discrepancy (MMD) are often used to detect differences between two distributions in machine learning applications.
We study the estimation of the MMD under $epsilon$-contamination, where a possibly non-random $epsilon$ proportion of one distribution is erroneously grouped with the other.
We propose a method to estimate these bounds, and show that it gives estimates that converge to the sharpest possible bounds on the MMD as sample size increases.
arXiv Detail & Related papers (2023-08-07T13:21:58Z) - Hard-normal Example-aware Template Mutual Matching for Industrial Anomaly Detection [78.734927709231]
Anomaly detectors are widely used in industrial manufacturing to detect and localize unknown defects in query images.<n>These detectors are trained on anomaly-free samples and have successfully distinguished anomalies from most normal samples.<n>However, hard-normal examples are scattered and far apart from most normal samples, and thus they are often mistaken for anomalies by existing methods.
arXiv Detail & Related papers (2023-03-28T17:54:56Z) - On Calibrating Diffusion Probabilistic Models [78.75538484265292]
diffusion probabilistic models (DPMs) have achieved promising results in diverse generative tasks.
We propose a simple way for calibrating an arbitrary pretrained DPM, with which the score matching loss can be reduced and the lower bounds of model likelihood can be increased.
Our calibration method is performed only once and the resulting models can be used repeatedly for sampling.
arXiv Detail & Related papers (2023-02-21T14:14:40Z) - A Permutation-free Kernel Two-Sample Test [36.50719125230106]
We propose a new quadratic-time MMD test statistic based on sample-splitting and studentization.
For large sample sizes, our new cross-MMD provides a significant speedup over the MMD, for only a slight loss in power.
arXiv Detail & Related papers (2022-11-27T18:15:52Z) - Maximum Mean Discrepancy on Exponential Windows for Online Change Detection [3.1631981412766335]
We propose a new change detection algorithm, called Maximum Mean Discrepancy on Exponential Windows (MMDEW)<n>MMDEW combines the benefits of MMD with an efficient computation based on exponential windows.<n>We prove that MMDEW enjoys polylogarithmic runtime and logarithmic memory complexity and show empirically that it outperforms the state of the art on benchmark data streams.
arXiv Detail & Related papers (2022-05-25T12:02:59Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - Manifold Hypothesis in Data Analysis: Double Geometrically-Probabilistic
Approach to Manifold Dimension Estimation [92.81218653234669]
We present new approach to manifold hypothesis checking and underlying manifold dimension estimation.
Our geometrical method is a modification for sparse data of a well-known box-counting algorithm for Minkowski dimension calculation.
Experiments on real datasets show that the suggested approach based on two methods combination is powerful and effective.
arXiv Detail & Related papers (2021-07-08T15:35:54Z) - A Unified Joint Maximum Mean Discrepancy for Domain Adaptation [73.44809425486767]
This paper theoretically derives a unified form of JMMD that is easy to optimize.
From the revealed unified JMMD, we illustrate that JMMD degrades the feature-label dependence that benefits to classification.
We propose a novel MMD matrix to promote the dependence, and devise a novel label kernel that is robust to label distribution shift.
arXiv Detail & Related papers (2021-01-25T09:46:14Z) - DWMD: Dimensional Weighted Orderwise Moment Discrepancy for
Domain-specific Hidden Representation Matching [21.651807102769954]
Key challenge in this field is establishing a metric that can measure the data distribution discrepancy between two homogeneous domains.
We propose a novel moment-based probability distribution metric termed dimensional weighted orderwise moment discrepancy (DWMD) for feature representation matching.
Our metric function takes advantage of a series for high-order moment alignment, and we theoretically prove that our DWMD metric function is error-free.
arXiv Detail & Related papers (2020-07-18T02:37:32Z) - Learning to Match Distributions for Domain Adaptation [116.14838935146004]
This paper proposes Learning to Match (L2M) to automatically learn the cross-domain distribution matching.
L2M reduces the inductive bias by using a meta-network to learn the distribution matching loss in a data-driven way.
Experiments on public datasets substantiate the superiority of L2M over SOTA methods.
arXiv Detail & Related papers (2020-07-17T03:26:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.