Stochastic Gradient Langevin Dynamics Based on Quantization with
Increasing Resolution
- URL: http://arxiv.org/abs/2305.18864v2
- Date: Wed, 4 Oct 2023 07:50:15 GMT
- Title: Stochastic Gradient Langevin Dynamics Based on Quantization with
Increasing Resolution
- Authors: JInwuk Seok and Changsik Cho
- Abstract summary: We propose an alternative descent learning equation based on quantized optimization for non- objective functions.
We demonstrate the effectiveness of the proposed on vanilla neural convolution neural(CNN) models and the architecture across various data sets.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Stochastic learning dynamics based on Langevin or Levy stochastic
differential equations (SDEs) in deep neural networks control the variance of
noise by varying the size of the mini-batch or directly those of injecting
noise. Since the noise variance affects the approximation performance, the
design of the additive noise is significant in SDE-based learning and practical
implementation. In this paper, we propose an alternative stochastic descent
learning equation based on quantized optimization for non-convex objective
functions, adopting a stochastic analysis perspective. The proposed method
employs a quantized optimization approach that utilizes Langevin SDE dynamics,
allowing for controllable noise with an identical distribution without the need
for additive noise or adjusting the mini-batch size. Numerical experiments
demonstrate the effectiveness of the proposed algorithm on vanilla convolution
neural network(CNN) models and the ResNet-50 architecture across various data
sets. Furthermore, we provide a simple PyTorch implementation of the proposed
algorithm.
Related papers
- Variational Neural Stochastic Differential Equations with Change Points [4.692174333076032]
We explore modeling change points in time-series data using neural differential equations (neural SDEs)
We propose a novel model formulation and training procedure based on the variational autoencoder (VAE) framework for modeling time-series as a neural SDE.
We present an empirical evaluation that demonstrates the expressive power of our proposed model, showing that it can effectively model both classical parametric SDEs and some real datasets with distribution shifts.
arXiv Detail & Related papers (2024-11-01T14:46:17Z) - Noise in the reverse process improves the approximation capabilities of
diffusion models [27.65800389807353]
In Score based Generative Modeling (SGMs), the state-of-the-art in generative modeling, reverse processes are known to perform better than their deterministic counterparts.
This paper delves into the heart of this phenomenon, comparing neural ordinary differential equations (ODEs) and neural dimension equations (SDEs) as reverse processes.
We analyze the ability of neural SDEs to approximate trajectories of the Fokker-Planck equation, revealing the advantages of neurality.
arXiv Detail & Related papers (2023-12-13T02:39:10Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - The effective noise of Stochastic Gradient Descent [9.645196221785694]
Gradient Descent (SGD) is the workhorse algorithm of deep learning technology.
We characterize the parameters of SGD and a recently-introduced variant, persistent SGD, in a neural network model.
We find that noisier algorithms lead to wider decision boundaries of the corresponding constraint satisfaction problem.
arXiv Detail & Related papers (2021-12-20T20:46:19Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Stochastic gradient descent with noise of machine learning type. Part
II: Continuous time analysis [0.0]
We show that in a certain noise regime, the optimization algorithm prefers 'flat' minima of the objective function in a sense which is different from the flat minimum selection of continuous time SGD with homogeneous noise.
arXiv Detail & Related papers (2021-06-04T16:34:32Z) - Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections [73.95786440318369]
We focus on the so-called implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of gradient descent (SGD)
We show that this effect induces an asymmetric heavy-tailed noise on gradient updates.
We then formally prove that GNIs induce an implicit bias', which varies depending on the heaviness of the tails and the level of asymmetry.
arXiv Detail & Related papers (2021-02-13T21:28:09Z) - Noise Optimization for Artificial Neural Networks [0.973490996330539]
We propose a new technique to compute the pathwise gradient estimate with respect to the standard deviation of the Gaussian noise added to each neuron of the ANN.
In numerical experiments, our proposed method can achieve significant performance improvement on robustness of several popular ANN structures.
arXiv Detail & Related papers (2021-02-06T08:30:20Z) - Neural Control Variates [71.42768823631918]
We show that a set of neural networks can face the challenge of finding a good approximation of the integrand.
We derive a theoretically optimal, variance-minimizing loss function, and propose an alternative, composite loss for stable online training in practice.
Specifically, we show that the learned light-field approximation is of sufficient quality for high-order bounces, allowing us to omit the error correction and thereby dramatically reduce the noise at the cost of negligible visible bias.
arXiv Detail & Related papers (2020-06-02T11:17:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.