Related papers: Score-Based Training for Energy-Based TTS Models

Related papers

A Diffusive Classification Loss for Learning Energy-based Generative Models [27.078167178968076]
We introduce the Diffusive Classification (DiffCLF) objective, a simple method that avoids blindness while remaining computationally efficient.<n>We validate the effectiveness of DiffCLF by comparing the estimated energies against ground truth in analytical Gaussian mixture cases.<n>Our results show that DiffCLF enables EBMs with higher fidelity and broader applicability than existing approaches.
arXiv Detail & Related papers (2026-01-28T20:37:53Z)
Score-based Membership Inference on Diffusion Models [3.742113529511043]
Membership inference attacks (MIAs) against diffusion models have emerged as a pressing privacy concern.<n>We present a theoretical and empirical study of score-based MIAs, focusing on the predicted noise vectors that diffusion models learn to approximate.<n>We show that the expected denoiser output points toward a kernel-weighted local mean of nearby training samples, such that its norm encodes proximity to the training set and thereby reveals membership.
arXiv Detail & Related papers (2025-09-29T16:28:55Z)
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training [64.0932926819307]
We present Warmup-Stable and Merge (WSM), a framework that establishes a formal connection between learning rate decay and model merging.<n>WSM provides a unified theoretical foundation for emulating various decay strategies.<n>Our framework consistently outperforms the widely-adopted Warmup-Stable-Decay (WSD) approach across multiple benchmarks.
arXiv Detail & Related papers (2025-07-23T16:02:06Z)
Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z)
Expressive Score-Based Priors for Distribution Matching with Geometry-Preserving Regularization [10.432302605566331]
gradient-based DM training only requires the prior's score function -- not its density.<n>This approach eliminates biases from fixed priors, enabling more effective use of geometry-preserving regularization.<n>Our method also demonstrates better stability and computational efficiency compared to other diffusion-based priors.
arXiv Detail & Related papers (2025-06-17T15:08:16Z)
Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training [57.03005244917803]
Large language models (LLMs) often fail on out-of-distribution (OOD) samples due to spurious correlations acquired during pre-training.<n>Here, we aim to mitigate such spurious correlations through causality-aware post-training (CAPT)<n> Experiments on the formal causal inference benchmark CLadder and the logical reasoning dataset PrOntoQA show that 3B-scale language models fine-tuned with CAPT can outperform both traditional SFT and larger LLMs on in-distribution (ID) and OOD tasks.
arXiv Detail & Related papers (2025-06-11T06:30:28Z)
Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.<n>We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.<n>Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z)
On the connection between Noise-Contrastive Estimation and Contrastive Divergence [13.312007032203859]
Noise-contrastive estimation (NCE) is a popular method for estimating unnormalised probabilistic models. We show that two NCE criteria, ranking NCE and conditional NCE, can be viewed as ML estimation methods.
arXiv Detail & Related papers (2024-02-26T16:04:47Z)
Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood [64.95663299945171]
Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming. There exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. We propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs.
arXiv Detail & Related papers (2023-09-10T22:05:24Z)
Balanced Training of Energy-Based Models with Adaptive Flow Sampling [13.951904929884618]
Energy-based models (EBMs) are versatile density estimation models that directly parameterize an unnormalized log density. We propose a new maximum likelihood training algorithm for EBMs that uses a different type of generative model, normalizing flows (NF) Our method fits an NF to an EBM during training so that an NF-assisted sampling scheme provides an accurate gradient for the EBMs at all times.
arXiv Detail & Related papers (2023-06-01T13:58:06Z)
Self-Adapting Noise-Contrastive Estimation for Energy-Based Models [0.0]
Training energy-based models with noise-contrastive estimation (NCE) is theoretically feasible but practically challenging. Previous works have explored modelling the noise distribution as a separate generative model, and then concurrently training this noise model with the EBM. This thesis proposes a self-adapting NCE algorithm which uses static instances of the EBM along its training trajectory as the noise distribution.
arXiv Detail & Related papers (2022-11-03T15:17:43Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency) Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z)
Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models [19.07718284287928]
We show that the difficulty of obtaining reliable gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed. We propose to directly maximize the target log-likelihood and simultaneously minimize the inclusive divergence between the posterior and the inference model. The resulting learning algorithm is called joint SA (JSA)
arXiv Detail & Related papers (2020-05-28T13:50:08Z)
Training Deep Energy-Based Models with f-Divergence Minimization [113.97274898282343]
Deep energy-based models (EBMs) are very flexible in distribution parametrization but computationally challenging. We propose a general variational framework termed f-EBM to train EBMs using any desired f-divergence. Experimental results demonstrate the superiority of f-EBM over contrastive divergence, as well as the benefits of training EBMs using f-divergences other than KL.
arXiv Detail & Related papers (2020-03-06T23:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.