Convergence, design and training of continuous-time dropout as a random batch method
- URL: http://arxiv.org/abs/2510.13134v1
- Date: Wed, 15 Oct 2025 04:19:01 GMT
- Title: Convergence, design and training of continuous-time dropout as a random batch method
- Authors: Antonio Álvarez-López, Martín Hernández,
- Abstract summary: We study dropout regularization in continuous-time models through the lens of random-batch methods.<n>We construct an unbiased, well-posed estimator that mimics dropout by sampling neuron batches over time intervals of length $h$.<n>We then specialize in a single-layer neural ODE and validate the theory on classification and flow matching.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study dropout regularization in continuous-time models through the lens of random-batch methods -- a family of stochastic sampling schemes originally devised to reduce the computational cost of interacting particle systems. We construct an unbiased, well-posed estimator that mimics dropout by sampling neuron batches over time intervals of length $h$. Trajectory-wise convergence is established with linear rate in $h$ for the expected uniform error. At the distribution level, we establish stability for the associated continuity equation, with total-variation error of order $h^{1/2}$ under mild moment assumptions. During training with fixed batch sampling across epochs, a Pontryagin-based adjoint analysis bounds deviations in the optimal cost and control, as well as in gradient-descent iterates. On the design side, we compare convergence rates for canonical batch sampling schemes, recover standard Bernoulli dropout as a special case, and derive a cost--accuracy trade-off yielding a closed-form optimal $h$. We then specialize to a single-layer neural ODE and validate the theory on classification and flow matching, observing the predicted rates, regularization effects, and favorable runtime and memory profiles.
Related papers
- Optimal Unconstrained Self-Distillation in Ridge Regression: Strict Improvements, Precise Asymptotics, and One-Shot Tuning [61.07540493350384]
Self-distillation (SD) is the process of retraining a student on a mixture of ground-truth and the teacher's own predictions.<n>We show that for any prediction risk, the optimally mixed student improves upon the ridge teacher for every regularization level.<n>We propose a consistent one-shot tuning method to estimate $star$ without grid search, sample splitting, or refitting.
arXiv Detail & Related papers (2026-02-19T17:21:15Z) - Efficient Sampling with Discrete Diffusion Models: Sharp and Adaptive Guarantees [9.180350432640912]
We study the sampling efficiency of score-based discrete diffusion models under a continuous-time Markov chain (CTMC) formulation.<n>For uniform discrete diffusion, we show that the $$-leaping algorithm achieves an complexity of order $tilde O(d/varepsilon)$.<n>For masking discrete diffusion, we introduce a modified $$-leaping sampler whose convergence rate is governed by an intrinsic information-theoretic quantity.
arXiv Detail & Related papers (2026-02-16T18:48:17Z) - Variational Optimality of Föllmer Processes in Generative Diffusions [20.583125441867434]
We analyze generative diffusions that transport a point mass to a prescribed target distribution over a finite time horizon.<n>We show that the diffusion coefficient can be tuned emphaposteriori without changing the time-marginal distributions.
arXiv Detail & Related papers (2026-02-11T16:15:19Z) - Learnable Chernoff Baselines for Inference-Time Alignment [64.81256817158851]
We introduce Learnable Chernoff Baselines as a method for efficiently and approximately sampling from exponentially tilted kernels.<n>We establish total-variation guarantees to the ideal aligned model, and demonstrate in both continuous and discrete diffusion settings that LCB sampling closely matches ideal rejection sampling.
arXiv Detail & Related papers (2026-02-08T00:09:40Z) - Low-Dimensional Adaptation of Rectified Flow: A New Perspective through the Lens of Diffusion and Stochastic Localization [59.04314685837778]
Rectified flow (RF) has gained considerable popularity due to its generation efficiency and state-of-the-art performance.<n>In this paper, we investigate the degree to which RF automatically adapts to the intrinsic low dimensionality of the support of the target distribution to accelerate sampling.<n>We show that, using a carefully designed choice of the time-discretization scheme and with sufficiently accurate drift estimates, the RF sampler enjoys an complexity of order $O(k/varepsilon)$.
arXiv Detail & Related papers (2026-01-21T22:09:27Z) - Dimension-free error estimate for diffusion model and optimal scheduling [22.20348860913421]
Diffusion generative models have emerged as powerful tools for producing synthetic data from an empirically observed distribution.<n>Previous analyses have quantified the error between the generated and the true data distributions in terms of Wasserstein distance or Kullback-Leibler divergence.<n>In this work, we derive an explicit, dimension-free bound on the discrepancy between the generated and the true data distributions.
arXiv Detail & Related papers (2025-12-01T15:58:20Z) - Generative Modeling with Continuous Flows: Sample Complexity of Flow Matching [60.37045080890305]
We provide the first analysis of the sample complexity for flow-matching based generative models.<n>We decompose the velocity field estimation error into neural-network approximation error, statistical error due to the finite sample size, and optimization error due to the finite number of optimization steps for estimating the velocity field.
arXiv Detail & Related papers (2025-12-01T05:14:25Z) - Preconditioned Regularized Wasserstein Proximal Sampling [2.7957842724446174]
We consider sampling from a noise-free distribution by evolving finitely many particles.<n>For potentials, we provide a non-asymotic convergence analysis and explicitly the bias, which is dependent on regularization.
arXiv Detail & Related papers (2025-09-01T18:04:31Z) - Approximating Full Conformal Prediction for Neural Network Regression with Gauss-Newton Influence [8.952347049759094]
We construct prediction intervals for neural network regressors post-hoc without held-out data.<n>We train just once and locally perturb model parameters using Gauss-Newton influence.
arXiv Detail & Related papers (2025-07-27T13:34:32Z) - Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z) - Estimating stationary mass, frequency by frequency [11.476508212290275]
We consider the problem of estimating the probability mass placed by the stationary distribution of an exponentially $alpha$-mixing process.<n>We estimate this vector of probabilities in total variation distance, showing universal consistency in $n$.<n>We develop complementary tools -- including concentration inequalities for a natural self-normalized statistic mixing sequences -- that may prove independently useful in the design and analysis of estimators for related problems.
arXiv Detail & Related papers (2025-03-17T04:24:21Z) - An Efficient Rehearsal Scheme for Catastrophic Forgetting Mitigation during Multi-stage Fine-tuning [55.467047686093025]
A common approach to alleviate such forgetting is to rehearse samples from prior tasks during fine-tuning.<n>We propose a sampling scheme, textttbf mix-cd, that prioritizes rehearsal of collateral damage'' samples.<n>Our approach is computationally efficient, easy to implement, and outperforms several leading continual learning methods in compute-constrained settings.
arXiv Detail & Related papers (2024-02-12T22:32:12Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Regularized Vector Quantization for Tokenized Image Synthesis [126.96880843754066]
Quantizing images into discrete representations has been a fundamental problem in unified generative modeling.
deterministic quantization suffers from severe codebook collapse and misalignment with inference stage while quantization suffers from low codebook utilization and reconstruction objective.
This paper presents a regularized vector quantization framework that allows to mitigate perturbed above issues effectively by applying regularization from two perspectives.
arXiv Detail & Related papers (2023-03-11T15:20:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.