Related papers: Stochastic Gradient Langevin Dynamics Based on Quantization with Increasing Resolution

Stochastic Gradient Langevin Dynamics Based on Quantization with Increasing Resolution

URL: http://arxiv.org/abs/2305.18864v2
Date: Wed, 4 Oct 2023 07:50:15 GMT
Title: Stochastic Gradient Langevin Dynamics Based on Quantization with Increasing Resolution
Authors: JInwuk Seok and Changsik Cho
Abstract summary: We propose an alternative descent learning equation based on quantized optimization for non- objective functions. We demonstrate the effectiveness of the proposed on vanilla neural convolution neural(CNN) models and the architecture across various data sets.
Score: 0.0
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Stochastic learning dynamics based on Langevin or Levy stochastic differential equations (SDEs) in deep neural networks control the variance of noise by varying the size of the mini-batch or directly those of injecting noise. Since the noise variance affects the approximation performance, the design of the additive noise is significant in SDE-based learning and practical implementation. In this paper, we propose an alternative stochastic descent learning equation based on quantized optimization for non-convex objective functions, adopting a stochastic analysis perspective. The proposed method employs a quantized optimization approach that utilizes Langevin SDE dynamics, allowing for controllable noise with an identical distribution without the need for additive noise or adjusting the mini-batch size. Numerical experiments demonstrate the effectiveness of the proposed algorithm on vanilla convolution neural network(CNN) models and the ResNet-50 architecture across various data sets. Furthermore, we provide a simple PyTorch implementation of the proposed algorithm.

Related papers

Self-Supervised Coarsening of Unstructured Grid with Automatic Differentiation [55.88862563823878]
In this work, we present an original algorithm to coarsen an unstructured grid based on the concepts of differentiable physics.<n>We demonstrate performance of the algorithm on two PDEs: a linear equation which governs slightly compressible fluid flow in porous media and the wave equation.<n>Our results show that in the considered scenarios, we reduced the number of grid points up to 10 times while preserving the modeled variable dynamics in the points of interest.
arXiv Detail & Related papers (2025-07-24T11:02:13Z)
A Noise-Aware Scalable Subspace Classical Optimizer for the Quantum Approximate Optimization Algorithm [0.9086201982977716]
ANASTAARS is a noise-aware scalable classical algorithm for variational quantum algorithms.<n>It exploits adaptive random subspace strategies to efficiently optimize the ansatz parameters of a quantum approximate optimization algorithm.
arXiv Detail & Related papers (2025-07-15T05:15:25Z)
Variational Neural Stochastic Differential Equations with Change Points [4.692174333076032]
We explore modeling change points in time-series data using neural differential equations (neural SDEs) We propose a novel model formulation and training procedure based on the variational autoencoder (VAE) framework for modeling time-series as a neural SDE. We present an empirical evaluation that demonstrates the expressive power of our proposed model, showing that it can effectively model both classical parametric SDEs and some real datasets with distribution shifts.
arXiv Detail & Related papers (2024-11-01T14:46:17Z)
Noise in the reverse process improves the approximation capabilities of diffusion models [27.65800389807353]
In Score based Generative Modeling (SGMs), the state-of-the-art in generative modeling, reverse processes are known to perform better than their deterministic counterparts. This paper delves into the heart of this phenomenon, comparing neural ordinary differential equations (ODEs) and neural dimension equations (SDEs) as reverse processes. We analyze the ability of neural SDEs to approximate trajectories of the Fokker-Planck equation, revealing the advantages of neurality.
arXiv Detail & Related papers (2023-12-13T02:39:10Z)
Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS) We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z)
Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise. In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
The effective noise of Stochastic Gradient Descent [9.645196221785694]
Gradient Descent (SGD) is the workhorse algorithm of deep learning technology. We characterize the parameters of SGD and a recently-introduced variant, persistent SGD, in a neural network model. We find that noisier algorithms lead to wider decision boundaries of the corresponding constraint satisfaction problem.
arXiv Detail & Related papers (2021-12-20T20:46:19Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis [0.0]
We show that in a certain noise regime, the optimization algorithm prefers 'flat' minima of the objective function in a sense which is different from the flat minimum selection of continuous time SGD with homogeneous noise.
arXiv Detail & Related papers (2021-06-04T16:34:32Z)
Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections [73.95786440318369]
We focus on the so-called implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of gradient descent (SGD) We show that this effect induces an asymmetric heavy-tailed noise on gradient updates. We then formally prove that GNIs induce an implicit bias', which varies depending on the heaviness of the tails and the level of asymmetry.
arXiv Detail & Related papers (2021-02-13T21:28:09Z)
Noise Optimization for Artificial Neural Networks [0.973490996330539]
We propose a new technique to compute the pathwise gradient estimate with respect to the standard deviation of the Gaussian noise added to each neuron of the ANN. In numerical experiments, our proposed method can achieve significant performance improvement on robustness of several popular ANN structures.
arXiv Detail & Related papers (2021-02-06T08:30:20Z)
Neural Control Variates [71.42768823631918]
We show that a set of neural networks can face the challenge of finding a good approximation of the integrand. We derive a theoretically optimal, variance-minimizing loss function, and propose an alternative, composite loss for stable online training in practice. Specifically, we show that the learned light-field approximation is of sufficient quality for high-order bounces, allowing us to omit the error correction and thereby dramatically reduce the noise at the cost of negligible visible bias.
arXiv Detail & Related papers (2020-06-02T11:17:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.