Jarzynski Reweighting and Sampling Dynamics for Training Energy-Based Models: Theoretical Analysis of Different Transition Kernels
- URL: http://arxiv.org/abs/2506.07843v1
- Date: Mon, 09 Jun 2025 15:08:01 GMT
- Title: Jarzynski Reweighting and Sampling Dynamics for Training Energy-Based Models: Theoretical Analysis of Different Transition Kernels
- Authors: Davide Carbone,
- Abstract summary: Energy-Based Models (EBMs) provide a flexible framework for generative modeling.<n>Traditional methods, such as contrastive divergence and score matching, introduce biases that can hinder accurate learning.<n>We present a theoretical analysis of Jarzynski reweighting, a technique from non-equilibrium statistical mechanics, and its implications for training EBMs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Energy-Based Models (EBMs) provide a flexible framework for generative modeling, but their training remains theoretically challenging due to the need to approximate normalization constants and efficiently sample from complex, multi-modal distributions. Traditional methods, such as contrastive divergence and score matching, introduce biases that can hinder accurate learning. In this work, we present a theoretical analysis of Jarzynski reweighting, a technique from non-equilibrium statistical mechanics, and its implications for training EBMs. We focus on the role of the choice of the kernel and we illustrate these theoretical considerations in two key generative frameworks: (i) flow-based diffusion models, where we reinterpret Jarzynski reweighting in the context of stochastic interpolants to mitigate discretization errors and improve sample quality, and (ii) Restricted Boltzmann Machines, where we analyze its role in correcting the biases of contrastive divergence. Our results provide insights into the interplay between kernel choice and model performance, highlighting the potential of Jarzynski reweighting as a principled tool for generative learning.
Related papers
- CTRLS: Chain-of-Thought Reasoning via Latent State-Transition [57.51370433303236]
Chain-of-thought (CoT) reasoning enables large language models to break down complex problems into interpretable intermediate steps.<n>We introduce groundingS, a framework that formulates CoT reasoning as a Markov decision process (MDP) with latent state transitions.<n>We show improvements in reasoning accuracy, diversity, and exploration efficiency across benchmark reasoning tasks.
arXiv Detail & Related papers (2025-07-10T21:32:18Z) - SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows [0.0]
This paper introduces Symmetry-Enforcing Modulation (SESaMo)<n>SESaMo enables the incorporation of biases (e.g., symmetries) into normalizing flows through a novel technique called inductive modulation.<n>Our numerical experiments benchmark SESaMo in different scenarios, including an 8-Gaussian mixture model and physically relevant field theories.
arXiv Detail & Related papers (2025-05-26T07:34:11Z) - Overcoming Dimensional Factorization Limits in Discrete Diffusion Models through Quantum Joint Distribution Learning [79.65014491424151]
We propose a quantum Discrete Denoising Diffusion Probabilistic Model (QD3PM)<n>It enables joint probability learning through diffusion and denoising in exponentially large Hilbert spaces.<n>This paper establishes a new theoretical paradigm in generative models by leveraging the quantum advantage in joint distribution learning.
arXiv Detail & Related papers (2025-05-08T11:48:21Z) - Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning [54.07840818762834]
Conditional decision generation with diffusion models has shown powerful competitiveness in reinforcement learning (RL)<n>Recent studies reveal the relation between energy-function-guidance diffusion models and constrained RL problems.<n>Main challenge lies in estimating the intermediate energy, which is intractable due to the log-expectation formulation during the generation process.
arXiv Detail & Related papers (2025-05-03T14:00:25Z) - A theoretical framework for overfitting in energy-based modeling [5.1337384597700995]
We investigate the impact of limited data on training pairwise energy-based models for inverse problems aimed at identifying interaction networks.<n>We show that optimal points for early stopping arise from the interplay between these timescales and the initial conditions of training.<n>We propose a generalization to arbitrary energy-based models by deriving the neural tangent kernel dynamics of the score function under the score-matching.
arXiv Detail & Related papers (2025-01-31T14:21:02Z) - Physics-Informed Diffusion Models [0.0]
We present a framework that unifies generative modeling and partial differential equation fulfillment.<n>Our approach reduces the residual error by up to two orders of magnitude compared to previous work in a fluid flow case study.
arXiv Detail & Related papers (2024-03-21T13:52:55Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Unmasking Bias in Diffusion Model Training [40.90066994983719]
Denoising diffusion models have emerged as a dominant approach for image generation.
They still suffer from slow convergence in training and color shift issues in sampling.
In this paper, we identify that these obstacles can be largely attributed to bias and suboptimality inherent in the default training paradigm.
arXiv Detail & Related papers (2023-10-12T16:04:41Z) - Learning Robust Models Using The Principle of Independent Causal
Mechanisms [26.79262903241044]
We propose a new gradient-based learning framework whose objective function is derived from the ICM principle.
We show theoretically and experimentally that neural networks trained in this framework focus on relations remaining invariant across environments.
arXiv Detail & Related papers (2020-10-14T15:38:01Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - Training Deep Energy-Based Models with f-Divergence Minimization [113.97274898282343]
Deep energy-based models (EBMs) are very flexible in distribution parametrization but computationally challenging.
We propose a general variational framework termed f-EBM to train EBMs using any desired f-divergence.
Experimental results demonstrate the superiority of f-EBM over contrastive divergence, as well as the benefits of training EBMs using f-divergences other than KL.
arXiv Detail & Related papers (2020-03-06T23:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.