Exploring bidirectional bounds for minimax-training of Energy-based models
- URL: http://arxiv.org/abs/2506.04609v1
- Date: Thu, 05 Jun 2025 03:58:03 GMT
- Title: Exploring bidirectional bounds for minimax-training of Energy-based models
- Authors: Cong Geng, Jia Wang, Li Chen, Zhiyong Gao, Jes Frellsen, Søren Hauberg,
- Abstract summary: Energy-based models (EBMs) estimate unnormalized densities in an elegant framework, but they are difficult to train.<n>Recent work has linked EBMs to generative adversarial networks, by noting that they can be trained through a minimax game using a variational lower bound.<n>We propose to instead work with bidirectional bounds, meaning that we maximize a lower bound and minimize an upper bound when training the EBM.
- Score: 32.543485829282005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Energy-based models (EBMs) estimate unnormalized densities in an elegant framework, but they are generally difficult to train. Recent work has linked EBMs to generative adversarial networks, by noting that they can be trained through a minimax game using a variational lower bound. To avoid the instabilities caused by minimizing a lower bound, we propose to instead work with bidirectional bounds, meaning that we maximize a lower bound and minimize an upper bound when training the EBM. We investigate four different bounds on the log-likelihood derived from different perspectives. We derive lower bounds based on the singular values of the generator Jacobian and on mutual information. To upper bound the negative log-likelihood, we consider a gradient penalty-like bound, as well as one based on diffusion processes. In all cases, we provide algorithms for evaluating the bounds. We compare the different bounds to investigate, the pros and cons of the different approaches. Finally, we demonstrate that the use of bidirectional bounds stabilizes EBM training and yields high-quality density estimation and sample generation.
Related papers
- Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression [12.443289202402761]
We show the benefits of batch- partitioning through the lens of a minimum-norm overparametrized linear regression model.
We characterize the optimal batch size and show it is inversely proportional to the noise level.
We also show that shrinking the batch minimum-norm estimator by a factor equal to the Weiner coefficient further stabilizes it and results in lower quadratic risk in all settings.
arXiv Detail & Related papers (2023-06-14T11:02:08Z) - Bridging Discrete and Backpropagation: Straight-Through and Beyond [62.46558842476455]
We propose a novel approach to approximate the gradient of parameters involved in generating discrete latent variables.
We propose ReinMax, which achieves second-order accuracy by integrating Heun's method, a second-order numerical method for solving ODEs.
arXiv Detail & Related papers (2023-04-17T20:59:49Z) - Limitations of Information-Theoretic Generalization Bounds for Gradient
Descent Methods in Stochastic Convex Optimization [48.12845778927164]
We consider the prospect of establishing minimax rates for gradient descent in the setting of convex optimization.
We consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm.
Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.
arXiv Detail & Related papers (2022-12-27T17:16:48Z) - Sampling with Mollified Interaction Energy Descent [57.00583139477843]
We present a new optimization-based method for sampling called mollified interaction energy descent (MIED)
MIED minimizes a new class of energies on probability measures called mollified interaction energies (MIEs)
We show experimentally that for unconstrained sampling problems our algorithm performs on par with existing particle-based algorithms like SVGD.
arXiv Detail & Related papers (2022-10-24T16:54:18Z) - Cooperative Distribution Alignment via JSD Upper Bound [7.071749623370137]
Unsupervised distribution alignment estimates a transformation that maps two or more source distributions to a shared aligned distribution.
This task has many applications including generative modeling, unsupervised domain adaptation, and socially aware learning.
We propose to unify and generalize previous flow-based approaches under a single non-adversarial framework.
arXiv Detail & Related papers (2022-07-05T20:09:03Z) - Bounds all around: training energy-based models with bidirectional
bounds [26.507268387712145]
Energy-based models (EBMs) provide an elegant framework for density estimation, but they are notoriously difficult to train.
Recent work has established links to generative adversarial networks, where the EBM is trained through a minimax game with a variational value function.
We propose a bidirectional bound on the EBM log-likelihood, such that we maximize a lower bound and minimize an upper bound when solving the minimax game.
arXiv Detail & Related papers (2021-11-01T13:25:38Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Sequential- and Parallel- Constrained Max-value Entropy Search via
Information Lower Bound [9.09466320810472]
We focus on an information-theoretic approach called Max-value Entropy Search (MES)
We propose a novel constrained BO method called Constrained Max-value Entropy Search via Information lower BOund (CMES-IBO)
arXiv Detail & Related papers (2021-02-19T08:10:51Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information [105.73798100327667]
We propose a novel Contrastive Log-ratio Upper Bound (CLUB) of mutual information.
We provide a theoretical analysis of the properties of CLUB and its variational approximation.
Based on this upper bound, we introduce a MI minimization training scheme and further accelerate it with a negative sampling strategy.
arXiv Detail & Related papers (2020-06-22T05:36:16Z) - Lower bounds in multiple testing: A framework based on derandomized
proxies [107.69746750639584]
This paper introduces an analysis strategy based on derandomization, illustrated by applications to various concrete models.
We provide numerical simulations of some of these lower bounds, and show a close relation to the actual performance of the Benjamini-Hochberg (BH) algorithm.
arXiv Detail & Related papers (2020-05-07T19:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.