Related papers: Shifted Composition III: Local Error Framework for KL Divergence

Shifted Composition III: Local Error Framework for KL Divergence

URL: http://arxiv.org/abs/2412.17997v1
Date: Mon, 23 Dec 2024 21:40:01 GMT
Title: Shifted Composition III: Local Error Framework for KL Divergence
Authors: Jason M. Altschuler, Sinho Chewi,
Abstract summary: Coupling arguments are a central tool for bounding the deviation between two processes.<n>We adapt coupling arguments to the Kullback-Leibler (KL) divergence.
Score: 12.93725028754563
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Coupling arguments are a central tool for bounding the deviation between two stochastic processes, but traditionally have been limited to Wasserstein metrics. In this paper, we apply the shifted composition rule--an information-theoretic principle introduced in our earlier work--in order to adapt coupling arguments to the Kullback-Leibler (KL) divergence. Our framework combine the strengths of two previously disparate approaches: local error analysis and Girsanov's theorem. Akin to the former, it yields tight bounds by incorporating the so-called weak error, and is user-friendly in that it only requires easily verified local assumptions; and akin to the latter, it yields KL divergence guarantees and applies beyond Wasserstein contractivity. We apply this framework to the problem of sampling from a target distribution $\pi$. Here, the two stochastic processes are the Langevin diffusion and an algorithmic discretization thereof. Our framework provides a unified analysis when $\pi$ is assumed to be strongly log-concave (SLC), weakly log-concave (WLC), or to satisfy a log-Sobolev inequality (LSI). Among other results, this yields KL guarantees for the randomized midpoint discretization of the Langevin diffusion. Notably, our result: (1) yields the optimal $\tilde O(\sqrt d/\epsilon)$ rate in the SLC and LSI settings; (2) is the first result to hold beyond the 2-Wasserstein metric in the SLC setting; and (3) is the first result to hold in \emph{any} metric in the WLC and LSI settings.

Related papers

Explicit Density Approximation for Neural Implicit Samplers Using a Bernstein-Based Convex Divergence [3.2988338821464245]
We introduce dual-ISL, a novel likelihood-free objective for training implicit generative models.<n>We show that these theoretical advantages translate into practical ones.
arXiv Detail & Related papers (2025-06-05T07:21:54Z)
Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z)
Convergence Rate Analysis of LION [54.28350823319057]
LION converges iterations of $cal(sqrtdK-)$ measured by gradient Karush-Kuhn-T (sqrtdK-)$. We show that LION can achieve lower loss and higher performance compared to standard SGD.
arXiv Detail & Related papers (2024-11-12T11:30:53Z)
KPZ scaling from the Krylov space [83.88591755871734]
Recently, a superdiffusion exhibiting the Kardar-Parisi-Zhang scaling in late-time correlators and autocorrelators has been reported. Inspired by these results, we explore the KPZ scaling in correlation functions using their realization in the Krylov operator basis.
arXiv Detail & Related papers (2024-06-04T20:57:59Z)
Broadening Target Distributions for Accelerated Diffusion Models via a Novel Analysis Approach [49.97755400231656]
We show that a novel accelerated DDPM sampler achieves accelerated performance for three broad distribution classes not considered before. Our results show an improved dependency on the data dimension $d$ among accelerated DDPM type samplers.
arXiv Detail & Related papers (2024-02-21T16:11:47Z)
Optimization of Time-Dependent Decoherence Rates and Coherent Control for a Qutrit System [77.34726150561087]
Incoherent control makes the decoherence rates depending on time in a specific controlled manner. We consider the problem of maximizing the Hilbert-Schmidt overlap between the system's final state $rho(T)$ and a given target state $rho_rm target.
arXiv Detail & Related papers (2023-08-08T01:28:50Z)
Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning [9.31522898261934]
We investigate the impact of compression on gradient algorithms for machine learning. We highlight differences in terms of convergence rates between several unbiased compression operators. We extend our results to the case of federated learning.
arXiv Detail & Related papers (2023-08-02T18:02:00Z)
Faster high-accuracy log-concave sampling via algorithmic warm starts [6.117084972237769]
In practice, high-accuracy samplers such as the classical Metropolis-adjusted Langevin algorithm (MALA) remain the de facto gold standard. We improve the dimension dependence of this sampling problem to $tildeO(d1/2)$, whereas the previous best result for MALA was $tildeO(d)$. Our main technical contribution settles this problem by establishing the first $tildeO(d1/2)$ R'enyi mixing rates for the discretized underdamped Langevin diffusion.
arXiv Detail & Related papers (2023-02-20T19:27:21Z)
Convergence of the Inexact Langevin Algorithm and Score-based Generative Models in KL Divergence [4.974890682815778]
We study the Inexact Langevin Dynamics (ILD), Inexact Langevin Algorithm (ILA), and Score-based Generative Modeling (SGM) when utilizing estimated score functions for sampling.
arXiv Detail & Related papers (2022-11-02T23:12:59Z)
Riemannian optimization for non-centered mixture of scaled Gaussian distributions [17.855338784378]
This paper studies the statistical model of the non-centered mixture of scaled Gaussian distributions (NC-MSG) Using the Fisher-Rao information geometry associated to this distribution, we derive a Riemannian gradient descent algorithm. A Nearest centroid classifier is implemented leveraging the KL divergence and its associated center of mass.
arXiv Detail & Related papers (2022-09-07T17:22:20Z)
$\texttt{FedBC}$: Calibrating Global and Local Models via Federated Learning Beyond Consensus [66.62731854746856]
In federated learning (FL), the objective of collaboratively learning a global model through aggregation of model updates across devices tends to oppose the goal of personalization via local information. In this work, we calibrate this tradeoff in a quantitative manner through a multi-criterion-based optimization. We demonstrate that $texttFedBC$ balances the global and local model test accuracy metrics across a suite datasets.
arXiv Detail & Related papers (2022-06-22T02:42:04Z)
Utilising the CLT Structure in Stochastic Gradient based Sampling : Improved Analysis and Faster Algorithms [14.174806471635403]
We consider approximations of sampling algorithms, such as Gradient Langevin Dynamics (SGLD) and the Random Batch Method (RBM) for Interacting Particle Dynamcs (IPD) We observe that the noise introduced by the approximation is nearly Gaussian due to the Central Limit Theorem (CLT) while the driving Brownian motion is exactly Gaussian. We harness this structure to absorb the approximation error inside the diffusion process, and obtain improved convergence guarantees for these algorithms.
arXiv Detail & Related papers (2022-06-08T10:17:40Z)
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement. For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts. We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z)
Convergence of Random Reshuffling Under The Kurdyka-{\L}ojasiewicz Inequality [3.960041987749073]
We conduct convergence analysis for the non-descent RR with diminishing step sizes based on the Kurdyka-Lojasiewicz (KL) inequality. We derive the corresponding rate of convergence depending on the KL exponent and the suitably selected diminishing step sizes. We also establish similar strong limit-point convergence results for the proximal point method.
arXiv Detail & Related papers (2021-10-10T23:20:04Z)
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence. We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z)
Cumulant GAN [17.4556035872983]
We propose a novel loss function for training Generative Adversarial Networks (GANs) We show that the corresponding optimization problem is equivalent to R'enyi divergence minimization. We experimentally demonstrate that image generation is more robust relative to Wasserstein GAN.
arXiv Detail & Related papers (2020-06-11T17:23:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.