Measuring Sample Quality with Copula Discrepancies
- URL: http://arxiv.org/abs/2507.21434v1
- Date: Tue, 29 Jul 2025 02:11:45 GMT
- Title: Measuring Sample Quality with Copula Discrepancies
- Authors: Agnideep Aich, Ashit Baran Aich, Bruce Wade,
- Abstract summary: Copula Discrepancy (CD) is a principled and computationally efficient diagnostic for dependence structure.<n>Our theoretical framework provides the first structure-aware diagnostic specifically designed for the era of approximate inference.<n>With computational overhead orders of magnitude lower than existing Stein discrepancies, the CD provides both immediate practical value for MCMC practitioners and a theoretical foundation for the next generation of structure-aware sample quality assessment.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The scalable Markov chain Monte Carlo (MCMC) algorithms that underpin modern Bayesian machine learning, such as Stochastic Gradient Langevin Dynamics (SGLD), sacrifice asymptotic exactness for computational speed, creating a critical diagnostic gap: traditional sample quality measures fail catastrophically when applied to biased samplers. While powerful Stein-based diagnostics can detect distributional mismatches, they provide no direct assessment of dependence structure, often the primary inferential target in multivariate problems. We introduce the Copula Discrepancy (CD), a principled and computationally efficient diagnostic that leverages Sklar's theorem to isolate and quantify the fidelity of a sample's dependence structure independent of its marginals. Our theoretical framework provides the first structure-aware diagnostic specifically designed for the era of approximate inference. Empirically, we demonstrate that a moment-based CD dramatically outperforms standard diagnostics like effective sample size for hyperparameter selection in biased MCMC, correctly identifying optimal configurations where traditional methods fail. Furthermore, our robust MLE-based variant can detect subtle but critical mismatches in tail dependence that remain invisible to rank correlation-based approaches, distinguishing between samples with identical Kendall's tau but fundamentally different extreme-event behavior. With computational overhead orders of magnitude lower than existing Stein discrepancies, the CD provides both immediate practical value for MCMC practitioners and a theoretical foundation for the next generation of structure-aware sample quality assessment.
Related papers
- Correcting Mode Proportion Bias in Generalized Bayesian Inference via a Weighted Kernel Stein Discrepancy [0.0]
Generalized Bayesian Inference (GBI) provides a flexible framework for updating prior distributions using various loss functions instead of the traditional likelihoods.<n>KSD-Bayes suffers from critical pathologies, including insensitivity to well-separated modes in multimodal posteriors.<n>We propose a weighted KSD method that retains computational efficiency while effectively capturing multimodal structures.
arXiv Detail & Related papers (2025-03-03T22:44:45Z) - Uncertainty quantification for Markov chains with application to temporal difference learning [63.49764856675643]
We develop novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains.<n>We analyze the TD learning algorithm, a widely used method for policy evaluation in reinforcement learning.
arXiv Detail & Related papers (2025-02-19T15:33:55Z) - Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.<n>The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.<n>The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z) - On sample complexity of conditional independence testing with Von Mises
estimator with application to causal discovery [21.12645737093305]
conditional independence testing is an essential step in constraint-based causal discovery algorithms.
We design a test for conditional independence based on our estimator, called VM-CI, which achieves optimal parametric rates.
We empirically show that VM-CI outperforms other popular CI tests in terms of either time or sample complexity.
arXiv Detail & Related papers (2023-10-20T14:52:25Z) - A Targeted Accuracy Diagnostic for Variational Approximations [8.969208467611896]
Variational Inference (VI) is an attractive alternative to Markov Chain Monte Carlo (MCMC)
Existing methods characterize the quality of the whole variational distribution.
We propose the TArgeted Diagnostic for Distribution Approximation Accuracy (TADDAA)
arXiv Detail & Related papers (2023-02-24T02:50:18Z) - Controlling Moments with Kernel Stein Discrepancies [74.82363458321939]
Kernel Stein discrepancies (KSDs) measure the quality of a distributional approximation.<n>We first show that standard KSDs used for weak convergence control fail to control moment convergence.<n>We then provide sufficient conditions under which alternative diffusion KSDs control both moment and weak convergence.
arXiv Detail & Related papers (2022-11-10T08:24:52Z) - Uncertainty in Extreme Multi-label Classification [81.14232824864787]
eXtreme Multi-label Classification (XMC) is an essential task in the era of big data for web-scale machine learning applications.
In this paper, we aim to investigate general uncertainty quantification approaches for tree-based XMC models with a probabilistic ensemble-based framework.
In particular, we analyze label-level and instance-level uncertainty in XMC, and propose a general approximation framework based on beam search to efficiently estimate the uncertainty with a theoretical guarantee under long-tail XMC predictions.
arXiv Detail & Related papers (2022-10-18T20:54:33Z) - Reframed GES with a Neural Conditional Dependence Measure [20.47061693587848]
We revisit the Greedy Equivalence Search (GES) algorithm, which is widely cited as a score-based algorithm for learning the Markov equivalence class (MEC)
We present a reframing of the GES algorithm, which is more flexible than the standard score-based version.
We propose a neural conditional dependence measure, which utilizes the expressive power of deep neural networks.
arXiv Detail & Related papers (2022-06-17T03:29:08Z) - Continuous-Time Modeling of Counterfactual Outcomes Using Neural
Controlled Differential Equations [84.42837346400151]
Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare.
Existing causal inference approaches consider regular, discrete-time intervals between observations and treatment decisions.
We propose a controllable simulation environment based on a model of tumor growth for a range of scenarios.
arXiv Detail & Related papers (2022-06-16T17:15:15Z) - High-dimensional Inference and FDR Control for Simulated Markov Random
Fields [1.9458156037869137]
This article explores statistical inference for simulated Markov random fields in high-dimensional settings.
We introduce a methodology based on Maximum Chain Monte Carlo Likelihood Estimation with Elastic-net regularization.
arXiv Detail & Related papers (2022-02-11T13:49:08Z) - A Unified Joint Maximum Mean Discrepancy for Domain Adaptation [73.44809425486767]
This paper theoretically derives a unified form of JMMD that is easy to optimize.
From the revealed unified JMMD, we illustrate that JMMD degrades the feature-label dependence that benefits to classification.
We propose a novel MMD matrix to promote the dependence, and devise a novel label kernel that is robust to label distribution shift.
arXiv Detail & Related papers (2021-01-25T09:46:14Z) - Minimax Quasi-Bayesian estimation in sparse canonical correlation
analysis via a Rayleigh quotient function [1.0878040851638]
Existing rate-optimal estimators for sparse canonical vectors have high computational cost.
We propose a quasi-Bayesian estimation procedure that achieves the minimax estimation rate.
We use the proposed methodology to maximally correlate clinical variables and proteomic data for better understanding the Covid-19 disease.
arXiv Detail & Related papers (2020-10-16T21:00:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.