Sequential Estimation of Convex Divergences using Reverse Submartingales
and Exchangeable Filtrations
- URL: http://arxiv.org/abs/2103.09267v1
- Date: Tue, 16 Mar 2021 18:22:14 GMT
- Title: Sequential Estimation of Convex Divergences using Reverse Submartingales
and Exchangeable Filtrations
- Authors: Tudor Manole, Aaditya Ramdas
- Abstract summary: We present a unified technique for sequential estimation of convex divergences between distributions.
The technical underpinnings of our approach lie in the observation that empirical convex divergences are (partially ordered) reverse submartingales.
These techniques appear to be powerful additions to the existing literature on both confidence sequences and convex divergences.
- Score: 31.088836418378534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a unified technique for sequential estimation of convex
divergences between distributions, including integral probability metrics like
the kernel maximum mean discrepancy, $\varphi$-divergences like the
Kullback-Leibler divergence, and optimal transport costs, such as powers of
Wasserstein distances. The technical underpinnings of our approach lie in the
observation that empirical convex divergences are (partially ordered) reverse
submartingales with respect to the exchangeable filtration, coupled with
maximal inequalities for such processes. These techniques appear to be powerful
additions to the existing literature on both confidence sequences and convex
divergences. We construct an offline-to-sequential device that converts a wide
array of existing offline concentration inequalities into time-uniform
confidence sequences that can be continuously monitored, providing valid
inference at arbitrary stopping times. The resulting sequential bounds pay only
an iterated logarithmic price over the corresponding fixed-time bounds,
retaining the same dependence on problem parameters (like dimension or alphabet
size if applicable).
Related papers
- A Geometric Unification of Distributionally Robust Covariance Estimators: Shrinking the Spectrum by Inflating the Ambiguity Set [20.166217494056916]
We propose a principled approach to construct covariance estimators without imposing restrictive assumptions.
We show that our robust estimators are efficiently computable and consistent.
Numerical experiments based on synthetic and real data show that our robust estimators are competitive with state-of-the-art estimators.
arXiv Detail & Related papers (2024-05-30T15:01:18Z) - Variance-Reducing Couplings for Random Features [57.73648780299374]
Random features (RFs) are a popular technique to scale up kernel methods in machine learning.
We find couplings to improve RFs defined on both Euclidean and discrete input spaces.
We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.
arXiv Detail & Related papers (2024-05-26T12:25:09Z) - Subgradient Convergence Implies Subdifferential Convergence on Weakly Convex Functions: With Uniform Rates Guarantees [2.719510212909501]
In nonsmooth, non-average approximation optimization, understanding the uniform convergence of subdifferential mappings is crucial for analyzing stationary points of samples of risk.
This work connects the uniform convergence of subgradient mappings to the empirical convergence of subgradient mappings.
We derive uniform convergence rates for subdifferential convex-composites measured by the Hausdorff metric.
arXiv Detail & Related papers (2024-05-16T17:49:46Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Non-Parametric Learning of Stochastic Differential Equations with Non-asymptotic Fast Rates of Convergence [65.63201894457404]
We propose a novel non-parametric learning paradigm for the identification of drift and diffusion coefficients of non-linear differential equations.
The key idea essentially consists of fitting a RKHS-based approximation of the corresponding Fokker-Planck equation to such observations.
arXiv Detail & Related papers (2023-05-24T20:43:47Z) - Approximation of optimization problems with constraints through kernel
Sum-Of-Squares [77.27820145069515]
We show that pointwise inequalities are turned into equalities within a class of nonnegative kSoS functions.
We also show that focusing on pointwise equality constraints enables the use of scattering inequalities to mitigate the curse of dimensionality in sampling the constraints.
arXiv Detail & Related papers (2023-01-16T10:30:04Z) - The rate of convergence of Bregman proximal methods: Local geometry vs.
regularity vs. sharpness [33.48987613928269]
We show that the convergence rate of a given method depends sharply on its associated Legendre exponent.
In particular, we show that boundary solutions exhibit a stark separation between methods with a zero and non-zero Legendre exponent.
arXiv Detail & Related papers (2022-11-15T10:49:04Z) - A lower confidence sequence for the changing mean of non-negative right
heavy-tailed observations with bounded mean [9.289846887298854]
A confidence sequence produces an adapted sequence of sets for a predictable parameter sequence with a time-parametric coverage guarantee.
This work constructs a non-asymptotic lower CS for the running average conditional expectation whose slack converges to zero.
arXiv Detail & Related papers (2022-10-20T09:50:05Z) - The Connection between Discrete- and Continuous-Time Descriptions of
Gaussian Continuous Processes [60.35125735474386]
We show that discretizations yielding consistent estimators have the property of invariance under coarse-graining'
This result explains why combining differencing schemes for derivatives reconstruction and local-in-time inference approaches does not work for time series analysis of second or higher order differential equations.
arXiv Detail & Related papers (2021-01-16T17:11:02Z) - Fused-Lasso Regularized Cholesky Factors of Large Nonstationary
Covariance Matrices of Longitudinal Data [0.0]
smoothness of the subdiagonals of the Cholesky factor of large covariance matrices is closely related to the degrees of nonstationarity of autoregressive models for time series and longitudinal data.
We propose an algorithm for sparse estimation of the Cholesky factor which decouple row-wise.
arXiv Detail & Related papers (2020-07-22T02:38:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.