Efficient Training of Boltzmann Generators Using Off-Policy Log-Dispersion Regularization
- URL: http://arxiv.org/abs/2602.03729v1
- Date: Tue, 03 Feb 2026 16:49:32 GMT
- Title: Efficient Training of Boltzmann Generators Using Off-Policy Log-Dispersion Regularization
- Authors: Henrik Schopmans, Christopher von Klitzing, Pascal Friederich,
- Abstract summary: Boltzmann generators are generative models that enable independent sampling from the Boltzmann distribution of physical systems at a given temperature.<n>We propose off-policy log-dispersion regularization (LDR), a novel regularization framework that builds on a generalization of the log-variance objective.<n>LDR acts as a shape regularizer of the energy landscape by leveraging additional information in the form of target energy labels.
- Score: 4.651750987298774
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sampling from unnormalized probability densities is a central challenge in computational science. Boltzmann generators are generative models that enable independent sampling from the Boltzmann distribution of physical systems at a given temperature. However, their practical success depends on data-efficient training, as both simulation data and target energy evaluations are costly. To this end, we propose off-policy log-dispersion regularization (LDR), a novel regularization framework that builds on a generalization of the log-variance objective. We apply LDR in the off-policy setting in combination with standard data-based training objectives, without requiring additional on-policy samples. LDR acts as a shape regularizer of the energy landscape by leveraging additional information in the form of target energy labels. The proposed regularization framework is broadly applicable, supporting unbiased or biased simulation datasets as well as purely variational training without access to target samples. Across all benchmarks, LDR improves both final performance and data efficiency, with sample efficiency gains of up to one order of magnitude.
Related papers
- Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization [60.87651283510059]
Group Relative Policy Optimization (GRPO) effectively scales LLM reasoning but incurs prohibitive computational costs.<n>We propose Dynamic Pruning Policy Optimization (DPPO), a framework that enables dynamic pruning while preserving unbiased gradient estimation.<n>To mitigate the data sparsity induced by pruning, we introduce Dense Prompt Packing, a window-based greedy strategy.
arXiv Detail & Related papers (2026-03-04T14:48:53Z) - Energy-Weighted Flow Matching: Unlocking Continuous Normalizing Flows for Efficient and Scalable Boltzmann Sampling [42.79674268979455]
Energy-Weighted Flow Matching is a novel training objective enabling continuous normalizing flows to model Boltzmann distributions.<n>Our algorithms demonstrate sample quality competitive with state-of-the-art energy-only methods.
arXiv Detail & Related papers (2025-09-03T21:16:03Z) - Energy-based Preference Optimization for Test-time Adaptation [4.379304291229695]
Test-Time Adaptation (TTA) approaches focus on adjusting the conditional distribution.<n>These methods often depend on uncertain predictions in the absence of label information, leading to unreliable performance.<n>Energy-based frameworks suggest a promising alternative to address distribution shifts without relying on uncertain predictions, instead computing the marginal distribution of target data.
arXiv Detail & Related papers (2025-05-26T07:21:32Z) - Generalized EXTRA stochastic gradient Langevin dynamics [6.899153618328339]
Langevin algorithms are popular Markov Chain Monte Carlo methods for Bayesian learning.<n>Their versions such as Langevin dynamics (SGLD) allow iterative learning based on randomly sampled mini-batches.<n>When data is decentralized across a network of agents subject to communication and privacy constraints, standard SGLD algorithms cannot be applied.
arXiv Detail & Related papers (2024-12-02T21:57:30Z) - AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning [98.26836657967162]
textbfAgentOhana aggregates agent trajectories from distinct environments, spanning a wide array of scenarios.
textbfxLAM-v0.1, a large action model tailored for AI agents, demonstrates exceptional performance across various benchmarks.
arXiv Detail & Related papers (2024-02-23T18:56:26Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Efficient Generative Modeling via Penalized Optimal Transport Network [1.8079016557290342]
We propose a versatile deep generative model based on the marginally-penalized Wasserstein (MPW) distance.<n>Through the MPW distance, POTNet effectively leverages low-dimensional marginal information to guide the overall alignment of joint distributions.<n>We derive a non-asymptotic bound on the generalization error of the MPW loss and establish convergence rates of the generative distribution learned by POTNet.
arXiv Detail & Related papers (2024-02-16T05:27:05Z) - Iterated Denoising Energy Matching for Sampling from Boltzmann Densities [109.23137009609519]
Iterated Denoising Energy Matching (iDEM)
iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our matching objective.
We show that the proposed approach achieves state-of-the-art performance on all metrics and trains $2-5times$ faster.
arXiv Detail & Related papers (2024-02-09T01:11:23Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - Insights into Closed-form IPM-GAN Discriminator Guidance for Diffusion Modeling [11.68361062474064]
We propose a theoretical framework to analyze the effect of the GAN discriminator on Langevin-based sampling.<n>We show that the proposed approach can be combined with existing accelerated-diffusion techniques to improve latent-space image generation.
arXiv Detail & Related papers (2023-06-02T16:24:07Z) - Efficient Training of Energy-Based Models Using Jarzynski Equality [13.636994997309307]
Energy-based models (EBMs) are generative models inspired by statistical physics.
The computation of its gradient with respect to the model parameters requires sampling the model distribution.
Here we show how results for nonequilibrium thermodynamics based on Jarzynski equality can be used to perform this computation efficiently.
arXiv Detail & Related papers (2023-05-30T21:07:52Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.