Related papers: Explicit Density Approximation for Neural Implicit Samplers Using a Bernstein-Based Convex Divergence

Explicit Density Approximation for Neural Implicit Samplers Using a Bernstein-Based Convex Divergence

URL: http://arxiv.org/abs/2506.04700v2
Date: Wed, 05 Nov 2025 20:10:41 GMT
Title: Explicit Density Approximation for Neural Implicit Samplers Using a Bernstein-Based Convex Divergence
Authors: José Manuel de Frutos, Manuel A. Vázquez, Pablo M. Olmos, Joaquín Míguez,
Abstract summary: We introduce dual-ISL, a novel likelihood-free objective for training implicit generative models.<n>We show that these theoretical advantages translate into practical ones.
Score: 6.110760886913874
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Rank-based statistical metrics, such as the invariant statistical loss (ISL), have recently emerged as robust and practically effective tools for training implicit generative models. In this work, we introduce dual-ISL, a novel likelihood-free objective for training implicit generative models that interchanges the roles of the target and model distributions in the ISL framework, yielding a convex optimization problem in the space of model densities. We prove that the resulting rank-based discrepancy $d_K$ is i) continuous under weak convergence and with respect to the $L^1$ norm, and ii) convex in its first argument-properties not shared by classical divergences such as KL or Wasserstein distances. Building on this, we develop a theoretical framework that interprets $d_K$ as an $L^2$-projection of the density ratio $q = p/\tilde p$ onto a Bernstein polynomial basis, from which we derive exact bounds on the truncation error, precise convergence rates, and a closed-form expression for the truncated density approximation. We further extend our analysis to the multivariate setting via random one-dimensional projections, defining a sliced dual-ISL divergence that retains both convexity and continuity. We empirically show that these theoretical advantages translate into practical ones. Specifically, across several benchmarks dual-ISL converges more rapidly, delivers markedly smoother and more stable training, and more effectively prevents mode collapse than classical ISL and other leading implicit generative methods-while also providing an explicit density approximation.

Related papers

Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z)
Efficient Sampling with Discrete Diffusion Models: Sharp and Adaptive Guarantees [9.180350432640912]
We study the sampling efficiency of score-based discrete diffusion models under a continuous-time Markov chain (CTMC) formulation.<n>For uniform discrete diffusion, we show that the $$-leaping algorithm achieves an complexity of order $tilde O(d/varepsilon)$.<n>For masking discrete diffusion, we introduce a modified $$-leaping sampler whose convergence rate is governed by an intrinsic information-theoretic quantity.
arXiv Detail & Related papers (2026-02-16T18:48:17Z)
Entropy-Based Dimension-Free Convergence and Loss-Adaptive Schedules for Diffusion Models [3.2091923314854416]
Diffusion generative models synthesize samples by discretizing reverse-time dynamics driven by a learned score (or denoiser)<n>We develop an information-theoretic approach to dimension-free convergence that avoids geometric assumptions.<n>We also propose a Loss-Adaptive Schedule (LAS) for efficient discretization of reverse SDE which is lightweight and relies only on the training loss.
arXiv Detail & Related papers (2026-01-29T16:28:21Z)
Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback [50.89125374999765]
We provide the first convergence guarantee for Optimistic Multiplicative Weights Update ($mathtOMWU$) in NLHF.<n>Our analysis identifies a novel marginal convergence behavior, where the probability of rarely played actions grows exponentially from exponentially small values.
arXiv Detail & Related papers (2025-12-31T12:08:29Z)
Discrete Diffusion Models: Novel Analysis and New Sampler Guarantees [70.88473359544084]
We introduce a new analytical approach for discrete diffusion models that removes the need for regularity assumptions.<n>For the standard $tau$-leaping method, we establish convergence guarantees in KL divergence that scale linearly with vocabulary size.<n>Our approach is also more broadly applicable: it provides the first convergence guarantees for other widely used samplers.
arXiv Detail & Related papers (2025-09-20T17:42:29Z)
Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z)
Convergence of Clipped-SGD for Convex $(L_0,L_1)$-Smooth Optimization with Heavy-Tailed Noise [60.17850744118546]
First-order methods with clipping, such as Clip-SGD, exhibit stronger convergence guarantees than SGD under the $(L_$1)$-smoothness assumption.<n>We establish the first high-probability convergence bounds for Clip-SGD applied to convex $(L_$1)$-smooth optimization with heavytailed noise.
arXiv Detail & Related papers (2025-05-27T07:23:42Z)
Multi-Step Consistency Models: Fast Generation with Theoretical Guarantees [15.366598179769918]
We provide a theoretical analysis of consistency models capable of mapping inputs at a given time to arbitrary points along the reverse trajectory.<n>We show that one can achieve a KL divergence of order $ O(varepsilon2) $ using only $ Oleft(logleft(fracdvarepsilonright) $ iterations with a constant step size.<n>We conclude that accurate learning is feasible using small discretization steps, both in smooth and non-smooth settings.
arXiv Detail & Related papers (2025-05-02T06:50:46Z)
Advancing Wasserstein Convergence Analysis of Score-Based Models: Insights from Discretization and Second-Order Acceleration [5.548787731232499]
We focus on the Wasserstein convergence analysis of score-based diffusion models.<n>We compare various discretization schemes, including Euler discretization, exponential midpoint and randomization methods.<n>We propose an accelerated sampler based on the local linearization method.
arXiv Detail & Related papers (2025-02-07T11:37:51Z)
Latent Schrodinger Bridge: Prompting Latent Diffusion for Fast Unpaired Image-to-Image Translation [58.19676004192321]
Diffusion models (DMs), which enable both image generation from noise and inversion from data, have inspired powerful unpaired image-to-image (I2I) translation algorithms. We tackle this problem with Schrodinger Bridges (SBs), which are differential equations (SDEs) between distributions with minimal transport cost. Inspired by this observation, we propose Latent Schrodinger Bridges (LSBs) that approximate the SB ODE via pre-trained Stable Diffusion. We demonstrate that our algorithm successfully conduct competitive I2I translation in unsupervised setting with only a fraction of cost required by previous DM-
arXiv Detail & Related papers (2024-11-22T11:24:14Z)
E$^2$M: Double Bounded $α$-Divergence Optimization for Tensor-based Discrete Density Estimation [3.9633191508712398]
We present a generalization of the expectation-maximization (EM) algorithm, called E$2M algorithm.<n>It circumvents this issue by first relaxing the optimization into minimization of a surrogate objective based on the Kullback-Leibler (KL) divergence.<n>Our approach offers flexible modeling for not only a variety of low-rank structures, including the CP, Tucker, and Train formats.
arXiv Detail & Related papers (2024-05-28T14:28:28Z)
Sobolev Space Regularised Pre Density Models [51.558848491038916]
We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density. This method is statistically consistent, and makes the inductive validation model clear and consistent.
arXiv Detail & Related papers (2023-07-25T18:47:53Z)
A Geometric Perspective on Diffusion Models [57.27857591493788]
We inspect the ODE-based sampling of a popular variance-exploding SDE. We establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm.
arXiv Detail & Related papers (2023-05-31T15:33:16Z)
NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer [45.47667026025716]
We propose a novel, robust and accelerated iteration that relies on two key elements. The convergence and stability of the obtained method, referred to as NAG-GS, are first studied extensively. We show that NAG-arity is competitive with state-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models.
arXiv Detail & Related papers (2022-09-29T16:54:53Z)
Wasserstein Distributional Learning [5.830831796910439]
Wasserstein Distributional Learning (WDL) is a flexible density-on-scalar regression modeling framework. We show that WDL better characterizes and uncovers the nonlinear dependence of the conditional densities. We demonstrate the effectiveness of the WDL framework through simulations and real-world applications.
arXiv Detail & Related papers (2022-09-12T02:32:17Z)
On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD) We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting. We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z)
Spatially relaxed inference on high-dimensional linear models [48.989769153211995]
We study the properties of ensembled clustered inference algorithms which combine spatially constrained clustering, statistical inference, and ensembling to aggregate several clustered inference solutions. We show that ensembled clustered inference algorithms control the $delta$-FWER under standard assumptions for $delta$ equal to the largest cluster diameter.
arXiv Detail & Related papers (2021-06-04T16:37:19Z)
Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance [14.06947898164194]
Heavy tails emerge in gradient descent (SGD) in various scenarios. We provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance. Our results indicate that even under heavy-tailed noise with infinite variance, SGD can converge to the global optimum.
arXiv Detail & Related papers (2021-02-20T13:45:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.