Divergence Inequalities with Applications in Ergodic Theory
- URL: http://arxiv.org/abs/2411.17241v1
- Date: Tue, 26 Nov 2024 09:06:20 GMT
- Title: Divergence Inequalities with Applications in Ergodic Theory
- Authors: Ian George, Alice Zheng, Akshay Bansal,
- Abstract summary: We establish a simple method for Pinsker inequalities as well as general bounds in terms of $chi2$-divergences for twice-differentiable $f$-divergences.
We show for many $f$-divergences the rate of contraction of a time homogeneous Markov chain is characterized by the input-dependent contraction coefficient of the $chi2$-divergence.
We extend these results to the Petz $f$-divergences in quantum information theory, albeit without any guarantee of efficient computation.
- Score: 1.024113475677323
- License:
- Abstract: The data processing inequality is central to information theory and motivates the study of monotonic divergences. However, it is not clear operationally we need to consider all such divergences. We establish a simple method for Pinsker inequalities as well as general bounds in terms of $\chi^{2}$-divergences for twice-differentiable $f$-divergences. These tools imply new relations for input-dependent contraction coefficients. We use these relations to show for many $f$-divergences the rate of contraction of a time homogeneous Markov chain is characterized by the input-dependent contraction coefficient of the $\chi^{2}$-divergence. This is efficient to compute and the fastest it could converge for a class of divergences. We show similar ideas hold for mixing times. Moreover, we extend these results to the Petz $f$-divergences in quantum information theory, albeit without any guarantee of efficient computation. These tools may have applications in other settings where iterative data processing is relevant.
Related papers
- Variance-Reducing Couplings for Random Features [57.73648780299374]
Random features (RFs) are a popular technique to scale up kernel methods in machine learning.
We find couplings to improve RFs defined on both Euclidean and discrete input spaces.
We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.
arXiv Detail & Related papers (2024-05-26T12:25:09Z) - Computing Marginal and Conditional Divergences between Decomposable
Models with Applications [7.89568731669979]
We propose an approach to compute the exact alpha-beta divergence between any marginal or conditional distribution of two decomposable models.
We show how our method can be used to analyze distributional changes by first applying it to a benchmark image dataset.
Based on our framework, we propose a novel way to quantify the error in contemporary superconducting quantum computers.
arXiv Detail & Related papers (2023-10-13T14:17:25Z) - Posterior Contraction Rates for Mat\'ern Gaussian Processes on
Riemannian Manifolds [51.68005047958965]
We show that intrinsic Gaussian processes can achieve better performance in practice.
Our work shows that finer-grained analyses are needed to distinguish between different levels of data-efficiency.
arXiv Detail & Related papers (2023-09-19T20:30:58Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Compression and Data Similarity: Combination of Two Techniques for
Communication-Efficient Solving of Distributed Variational Inequalities [137.6408511310322]
In this paper we consider a combination of two popular approaches: compression and data similarity.
We show that this synergy can be more effective than each of the approaches separately in solving distributed smooth strongly monotonic variational inequalities.
arXiv Detail & Related papers (2022-06-19T16:38:56Z) - An Improved Analysis of Gradient Tracking for Decentralized Machine
Learning [34.144764431505486]
We consider decentralized machine learning over a network where the training data is distributed across $n$ agents.
The agent's common goal is to find a model that minimizes the average of all local loss functions.
We improve the dependency on $p$ from $mathcalO(p-1)$ to $mathcalO(p-1)$ in the noiseless case.
arXiv Detail & Related papers (2022-02-08T12:58:14Z) - A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning [3.655021726150368]
In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making.
We derive a novel law of iterated for a family of distributed nonlinear approximation schemes that is useful in MARL.
arXiv Detail & Related papers (2021-10-27T08:01:17Z) - Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes.
An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps.
This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z) - $(f,\Gamma)$-Divergences: Interpolating between $f$-Divergences and
Integral Probability Metrics [6.221019624345409]
We develop a framework for constructing information-theoretic divergences that subsume both $f$-divergences and integral probability metrics (IPMs)
We show that they can be expressed as a two-stage mass-redistribution/mass-transport process.
Using statistical learning as an example, we demonstrate their advantage in training generative adversarial networks (GANs) for heavy-tailed, not-absolutely continuous sample distributions.
arXiv Detail & Related papers (2020-11-11T18:17:09Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z) - Cram\'er-Rao Lower Bounds Arising from Generalized Csisz\'ar Divergences [17.746238062801293]
We study the geometry of probability distributions with respect to a generalized family of Csisz'ar $f$-divergences.
We show that these formulations lead us to find unbiased and efficient estimators for the escort model.
arXiv Detail & Related papers (2020-01-14T13:41:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.