A Uniform Concentration Inequality for Kernel-Based Two-Sample   Statistics
        - URL: http://arxiv.org/abs/2405.14051v3
- Date: Mon, 10 Feb 2025 04:27:47 GMT
- Title: A Uniform Concentration Inequality for Kernel-Based Two-Sample   Statistics
- Authors: Yijin Ni, Xiaoming Huo, 
- Abstract summary: We show that these metrics can be unified under a general framework of kernel-based two-sample statistics.<n>This paper establishes a novel uniform concentration inequality for the aforementioned kernel-based statistics.<n>As illustrative applications, we demonstrate how these bounds facilitate the component of error bounds for procedures such as distance covariance-based dimension reduction.
- Score: 4.757470449749877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   In many contemporary statistical and machine learning methods, one needs to optimize an objective function that depends on the discrepancy between two probability distributions. The discrepancy can be referred to as a metric for distributions. Widely adopted examples of such a metric include Energy Distance (ED), distance Covariance (dCov), Maximum Mean Discrepancy (MMD), and the Hilbert-Schmidt Independence Criterion (HSIC). We show that these metrics can be unified under a general framework of kernel-based two-sample statistics.   This paper establishes a novel uniform concentration inequality for the aforementioned kernel-based statistics. Our results provide upper bounds for estimation errors in the associated optimization problems, thereby offering both finite-sample and asymptotic performance guarantees. As illustrative applications, we demonstrate how these bounds facilitate the derivation of error bounds for procedures such as distance covariance-based dimension reduction, distance covariance-based independent component analysis, MMD-based fairness-constrained inference, MMD-based generative model search, and MMD-based generative adversarial networks. 
 
      
        Related papers
        - Kernel Trace Distance: Quantum Statistical Metric between Measures   through RKHS Density Operators [11.899035547580201]
 We introduce a novel distance between measures that compares them through a Schatten norm of their kernel covariance operators.<n>We show that this new distance is an integral probability metric that can be framed between a Maximum Mean Discrepancy (MMD) and a Wasserstein distance.
 arXiv  Detail & Related papers  (2025-07-08T14:56:44Z)
- Kernel Quantile Embeddings and Associated Probability Metrics [12.484632369259659]
 We introduce the notion of kernel quantile embeddings (KQEs)<n>We use KQEs to construct a family of distances that: (i) are probability metrics under weaker kernel conditions than MMD; (ii) recover a kernelised form of the sliced Wasserstein distance; and (iii) can be efficiently estimated with near-linear cost.
 arXiv  Detail & Related papers  (2025-05-26T18:27:17Z)
- Consistent Estimation of a Class of Distances Between Covariance   Matrices [7.291687946822539]
 We are interested in the family of distances that can be expressed as sums of traces of functions that are separately applied to each covariance matrix.
A statistical analysis of the behavior of this class of distance estimators has also been conducted.
We present a central limit theorem that establishes the Gaussianity of these estimators and provides closed form expressions for the corresponding means and variances.
 arXiv  Detail & Related papers  (2024-09-18T07:36:25Z)
- A Gradient Analysis Framework for Rewarding Good and Penalizing Bad   Examples in Language Models [63.949883238901414]
 We present a unique angle of gradient analysis of loss functions that simultaneously reward good examples and penalize bad ones in LMs.
We find that ExMATE serves as a superior surrogate for MLE, and that combining DPO with ExMATE instead of MLE further enhances both the statistical (5-7%) and generative (+18% win rate) performance.
 arXiv  Detail & Related papers  (2024-08-29T17:46:18Z)
- Statistical Framework for Clustering MU-MIMO Wireless via Second Order   Statistics [8.195126516665914]
 We consider an estimator of the Log-Euclidean distance between multiple sample covariance matrices (SCMs) consistent when the number of samples and the observation size grow unbounded at the same rate.
We develop a statistical framework that allows accurate predictions of the clustering algorithm's performance under realistic conditions.
 arXiv  Detail & Related papers  (2024-08-08T14:23:06Z)
- Distributed Markov Chain Monte Carlo Sampling based on the Alternating
  Direction Method of Multipliers [143.6249073384419]
 In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
 arXiv  Detail & Related papers  (2024-01-29T02:08:40Z)
- Partial identification of kernel based two sample tests with mismeasured
  data [5.076419064097733]
 Two-sample tests such as the Maximum Mean Discrepancy (MMD) are often used to detect differences between two distributions in machine learning applications.
We study the estimation of the MMD under $epsilon$-contamination, where a possibly non-random $epsilon$ proportion of one distribution is erroneously grouped with the other.
We propose a method to estimate these bounds, and show that it gives estimates that converge to the sharpest possible bounds on the MMD as sample size increases.
 arXiv  Detail & Related papers  (2023-08-07T13:21:58Z)
- Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
 Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
 arXiv  Detail & Related papers  (2022-09-27T07:58:25Z)
- Targeted Separation and Convergence with Kernel Discrepancies [61.973643031360254]
 kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or (ii) control weak convergence to P.
In this article we derive new sufficient and necessary conditions to ensure (i) and (ii)
For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels.
 arXiv  Detail & Related papers  (2022-09-26T16:41:16Z)
- Cycle Consistent Probability Divergences Across Different Spaces [38.43511529063335]
 Discrepancy measures between probability distributions are at the core of statistical inference and machine learning.
This work proposes a novel unbalanced Monge optimal transport formulation for matching, up to isometries, distributions on different spaces.
 arXiv  Detail & Related papers  (2021-11-22T16:35:58Z)
- Maximum Mean Discrepancy for Generalization in the Presence of
  Distribution and Missingness Shift [0.0]
 We find that integrating an MMD loss component helps models use the best features for generalization and avoid dangerous extrapolation as much as possible for each test sample.
Models treated with this MMD approach show better performance, calibration, and extrapolation on the test set.
 arXiv  Detail & Related papers  (2021-11-19T18:01:05Z)
- On the Optimization Landscape of Maximum Mean Discrepancy [26.661542645011046]
 Generative models have been successfully used for generating realistic signals.
Because the likelihood function is typically intractable in most of these models, the common practice is to "implicit" that avoid likelihood calculation.
In particular, it is not understood when they can minimize their non-repancy objectives globally.
 arXiv  Detail & Related papers  (2021-10-26T07:32:37Z)
- Keep it Tighter -- A Story on Analytical Mean Embeddings [0.6445605125467574]
 Kernel techniques are among the most popular and flexible approaches in data science.
Mean embedding gives rise to a divergence measure referred to as maximum mean discrepancy (MMD)
In this paper we focus on the problem of MMD estimation when the mean embedding of one of the underlying distributions is available analytically.
 arXiv  Detail & Related papers  (2021-10-15T21:29:27Z)
- Kernel distance measures for time series, random fields and other
  structured data [71.61147615789537]
 kdiff is a novel kernel-based measure for estimating distances between instances of structured data.
It accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution.
Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems.
 arXiv  Detail & Related papers  (2021-09-29T22:54:17Z)
- Fast and Efficient MMD-based Fair PCA via Optimization over Stiefel
  Manifold [41.58534159822546]
 This paper defines fair principal component analysis (PCA) as minimizing the maximum discrepancy (MMD) between dimensionality-reduced conditional distributions.
We provide optimality guarantees and explicitly show the theoretical effect in practical settings.
 arXiv  Detail & Related papers  (2021-09-23T08:06:02Z)
- A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
 Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.
We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense.
We provide algorithms to optimize such distributions in the finite-sample setting.
 arXiv  Detail & Related papers  (2021-06-18T08:33:45Z)
- Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
 Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
 arXiv  Detail & Related papers  (2021-06-07T17:47:16Z)
- Entropy Minimizing Matrix Factorization [102.26446204624885]
 Nonnegative Matrix Factorization (NMF) is a widely-used data analysis technique, and has yielded impressive results in many real-world tasks.
In this study, an Entropy Minimizing Matrix Factorization framework (EMMF) is developed to tackle the above problem.
Considering that the outliers are usually much less than the normal samples, a new entropy loss function is established for matrix factorization.
 arXiv  Detail & Related papers  (2021-03-24T21:08:43Z)
- Rethink Maximum Mean Discrepancy for Domain Adaptation [77.2560592127872]
 This paper theoretically proves two essential facts: 1) minimizing the Maximum Mean Discrepancy equals to maximize the source and target intra-class distances respectively but jointly minimize their variance with some implicit weights, so that the feature discriminability degrades.
Experiments on several benchmark datasets not only prove the validity of theoretical results but also demonstrate that our approach could perform better than the comparative state-of-art methods substantially.
 arXiv  Detail & Related papers  (2020-07-01T18:25:10Z)
- Minimax Optimal Estimation of KL Divergence for Continuous Distributions [56.29748742084386]
 Esting Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains.
One simple and effective estimator is based on the k nearest neighbor between these samples.
 arXiv  Detail & Related papers  (2020-02-26T16:37:37Z)
- Localized Debiased Machine Learning: Efficient Inference on Quantile
  Treatment Effects and Beyond [69.83813153444115]
 We consider an efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference.
Debiased machine learning (DML) is a data-splitting approach to estimating high-dimensional nuisances.
We propose localized debiased machine learning (LDML), which avoids this burdensome step.
 arXiv  Detail & Related papers  (2019-12-30T14:42:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.