Stochastic Mirror Descent in Average Ensemble Models
- URL: http://arxiv.org/abs/2210.15323v1
- Date: Thu, 27 Oct 2022 11:04:00 GMT
- Title: Stochastic Mirror Descent in Average Ensemble Models
- Authors: Taylan Kargin, Fariborz Salehi, Babak Hassibi
- Abstract summary: The mirror descent (SMD) is a general class of training algorithms, which includes the celebrated gradient descent (SGD) as a special case.
In this paper we explore the performance of the mirror potential algorithm on mean-field ensemble models.
- Score: 38.38572705720122
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The stochastic mirror descent (SMD) algorithm is a general class of training
algorithms, which includes the celebrated stochastic gradient descent (SGD), as
a special case. It utilizes a mirror potential to influence the implicit bias
of the training algorithm. In this paper we explore the performance of the SMD
iterates on mean-field ensemble models. Our results generalize earlier ones
obtained for SGD on such models. The evolution of the distribution of
parameters is mapped to a continuous time process in the space of probability
distributions. Our main result gives a nonlinear partial differential equation
to which the continuous time process converges in the asymptotic regime of
large networks. The impact of the mirror potential appears through a
multiplicative term that is equal to the inverse of its Hessian and which can
be interpreted as defining a gradient flow over an appropriately defined
Riemannian manifold. We provide numerical simulations which allow us to study
and characterize the effect of the mirror potential on the performance of
networks trained with SMD for some binary classification problems.
Related papers
- Mirror Diffusion Models for Constrained and Watermarked Generation [41.27274841596343]
Mirror Diffusion Models (MDM) is a new class of diffusion models that generate data on convex constrained sets without losing tractability.
For safety and privacy purposes, we also explore constrained sets as a new mechanism to embed invisible but quantitative information in generated data.
Our work brings new algorithmic opportunities for learning tractable diffusion on complex domains.
arXiv Detail & Related papers (2023-10-02T14:26:31Z) - The Generalization Error of Stochastic Mirror Descent on
Over-Parametrized Linear Models [37.6314945221565]
Deep networks are known to generalize well to unseen data.
Regularization properties ensure interpolating solutions with "good" properties are found.
We present simulation results that validate the theory and introduce two data models.
arXiv Detail & Related papers (2023-02-18T22:23:42Z) - Scalable Dynamic Mixture Model with Full Covariance for Probabilistic
Traffic Forecasting [16.04029885574568]
We propose a dynamic mixture of zero-mean Gaussian distributions for the time-varying error process.
The proposed method can be seamlessly integrated into existing deep-learning frameworks with only a few additional parameters to be learned.
We evaluate the proposed method on a traffic speed forecasting task and find that our method not only improves model horizons but also provides interpretabletemporal correlation structures.
arXiv Detail & Related papers (2022-12-10T22:50:00Z) - Distributed Bayesian Learning of Dynamic States [65.7870637855531]
The proposed algorithm is a distributed Bayesian filtering task for finite-state hidden Markov models.
It can be used for sequential state estimation, as well as for modeling opinion formation over social networks under dynamic environments.
arXiv Detail & Related papers (2022-12-05T19:40:17Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Implicit Regularization Properties of Variance Reduced Stochastic Mirror
Descent [7.00422423634143]
We prove that the discrete VRSMD estimator sequence converges to the minimum mirror interpolant in the linear regression.
We derive a model estimation accuracy result in the setting when the true model is sparse.
arXiv Detail & Related papers (2022-04-29T19:37:24Z) - Generative Adversarial Network for Probabilistic Forecast of Random
Dynamical System [19.742888499307178]
We present a deep learning model for data-driven simulations of random dynamical systems without a distributional assumption.
We propose a regularization strategy for a generative adversarial network based on consistency conditions for the sequential inference problems.
The behavior of the proposed model is studied by using three processes with complex noise structures.
arXiv Detail & Related papers (2021-11-04T19:50:56Z) - The Connection between Discrete- and Continuous-Time Descriptions of
Gaussian Continuous Processes [60.35125735474386]
We show that discretizations yielding consistent estimators have the property of invariance under coarse-graining'
This result explains why combining differencing schemes for derivatives reconstruction and local-in-time inference approaches does not work for time series analysis of second or higher order differential equations.
arXiv Detail & Related papers (2021-01-16T17:11:02Z) - Training Deep Energy-Based Models with f-Divergence Minimization [113.97274898282343]
Deep energy-based models (EBMs) are very flexible in distribution parametrization but computationally challenging.
We propose a general variational framework termed f-EBM to train EBMs using any desired f-divergence.
Experimental results demonstrate the superiority of f-EBM over contrastive divergence, as well as the benefits of training EBMs using f-divergences other than KL.
arXiv Detail & Related papers (2020-03-06T23:11:13Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.