Related papers: Randomized Automatic Differentiation

Randomized Automatic Differentiation

URL: http://arxiv.org/abs/2007.10412v2
Date: Sat, 13 Mar 2021 18:28:19 GMT
Title: Randomized Automatic Differentiation
Authors: Deniz Oktay, Nick McGreivy, Joshua Aduol, Alex Beatson, Ryan P. Adams
Abstract summary: We develop a general framework and approach for randomized automatic differentiation (RAD) RAD can allow unbiased estimates to be computed with reduced memory in return for variance. We show that RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks.
Score: 22.95414996614006
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The successes of deep learning, variational inference, and many other fields have been aided by specialized implementations of reverse-mode automatic differentiation (AD) to compute gradients of mega-dimensional objectives. The AD techniques underlying these tools were designed to compute exact gradients to numerical precision, but modern machine learning models are almost always trained with stochastic gradient descent. Why spend computation and memory on exact (minibatch) gradients only to use them for stochastic optimization? We develop a general framework and approach for randomized automatic differentiation (RAD), which can allow unbiased gradient estimates to be computed with reduced memory in return for variance. We examine limitations of the general approach, and argue that we must leverage problem specific structure to realize benefits. We develop RAD techniques for a variety of simple neural network architectures, and show that for a fixed memory budget, RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks. We also show that RAD can be applied to scientific computing, and use it to develop a low-memory stochastic gradient method for optimizing the control parameters of a linear reaction-diffusion PDE representing a fission reactor.

Related papers

Variance Reduction Methods Do Not Need to Compute Full Gradients: Improved Efficiency through Shuffling [44.31966204357333]
We develop memory-efficient algorithms for large-scale machine learning problems. We use two key techniques to make our approach memory-efficient and avoid full computations.
arXiv Detail & Related papers (2025-02-20T15:37:45Z)
Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement [29.675650285351768]
Machine unlearning (MU) has emerged to enhance the privacy and trustworthiness of deep neural networks. Approximate MU is a practical method for large-scale models. We propose a fast-slow parameter update strategy to implicitly approximate the up-to-date salient unlearning direction.
arXiv Detail & Related papers (2024-09-29T15:17:33Z)
Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks. We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights. Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem. CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint. It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z)
Efficient Learning of Generative Models via Finite-Difference Score Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference. Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z)
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training [2.9649783577150837]
We study the effect of mini-batching on the loss landscape of deep neural networks using spiked, field-dependent random matrix theory. We derive analytical expressions for the maximal descent and adaptive training regimens for smooth, non-Newton deep neural networks. We validate our claims on the VGG/ResNet and ImageNet datasets.
arXiv Detail & Related papers (2020-06-16T11:55:45Z)
Predictive Coding Approximates Backprop along Arbitrary Computation Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents. Our models perform equivalently to backprop on challenging machine learning benchmarks. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.