Deriving Differential Target Propagation from Iterating Approximate
Inverses
- URL: http://arxiv.org/abs/2007.15139v2
- Date: Mon, 17 Aug 2020 19:09:41 GMT
- Title: Deriving Differential Target Propagation from Iterating Approximate
Inverses
- Authors: Yoshua Bengio
- Abstract summary: We show that a particular form of target propagation, relying on learned inverses of each layer, which is differential, gives rise to an update rule which corresponds to an approximate Gauss-Newton gradient-based optimization.
We consider several iterative calculations based on local auto-encoders at each layer in order to achieve more precise inversions for more accurate target propagation.
- Score: 91.3755431537592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that a particular form of target propagation, i.e., relying on
learned inverses of each layer, which is differential, i.e., where the target
is a small perturbation of the forward propagation, gives rise to an update
rule which corresponds to an approximate Gauss-Newton gradient-based
optimization, without requiring the manipulation or inversion of large
matrices. What is interesting is that this is more biologically plausible than
back-propagation yet may turn out to implicitly provide a stronger optimization
procedure. Extending difference target propagation, we consider several
iterative calculations based on local auto-encoders at each layer in order to
achieve more precise inversions for more accurate target propagation and we
show that these iterative procedures converge exponentially fast if the
auto-encoding function minus the identity function has a Lipschitz constant
smaller than one, i.e., the auto-encoder is coarsely succeeding at performing
an inversion. We also propose a way to normalize the changes at each layer to
take into account the relative influence of each layer on the output, so that
larger weight changes are done on more influential layers, like would happen in
ordinary back-propagation with gradient descent.
Related papers
- Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms [28.046728466038022]
We study theoretical properties of a broad class of regularized algorithms with vector-valued output.
We rigorously confirm the so-called saturation effect for ridge regression with vector-valued output.
We present the upper bound for the finite sample risk general vector-valued spectral algorithms.
arXiv Detail & Related papers (2024-05-23T16:45:52Z) - An Alternative Graphical Lasso Algorithm for Precision Matrices [0.0]
We present a new/improved (dual-primal) DP-GLasso algorithm for estimating sparse precision matrices.
We show that the regularized normal log-likelihood naturally decouples into a sum of two easy to minimize convex functions one of which is a Lasso regression problem.
Our algorithm has the precision matrix as its optimization target right at the outset, and retains all the favorable properties of the DP-GLasso algorithm.
arXiv Detail & Related papers (2024-03-19T02:01:01Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Orthogonalizing Convolutional Layers with the Cayley Transform [83.73855414030646]
We propose and evaluate an alternative approach to parameterize convolutional layers that are constrained to be orthogonal.
We show that our method indeed preserves orthogonality to a high degree even for large convolutions.
arXiv Detail & Related papers (2021-04-14T23:54:55Z) - Training Invertible Linear Layers through Rank-One Perturbations [0.0]
This work presents a novel approach for training invertible linear layers.
In lieu of directly optimizing the network parameters, we train rank-one perturbations and add them to the actual weight matrices infrequently.
We show how such invertible blocks improve the mixing and thus normalizing the mode separation of the resulting flows.
arXiv Detail & Related papers (2020-10-14T12:43:47Z) - Feature Whitening via Gradient Transformation for Improved Convergence [3.5579740292581]
We address the complexity drawbacks of feature whitening.
We derive an equivalent method, which replaces the sample transformations by a transformation to the weight gradients, applied to every batch of B samples.
We exemplify the proposed algorithms with ResNet-based networks for image classification demonstrated on the CIFAR and Imagenet datasets.
arXiv Detail & Related papers (2020-10-04T11:30:20Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Variance Reduction with Sparse Gradients [82.41780420431205]
Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients.
We introduce a new sparsity operator: The random-top-k operator.
Our algorithm consistently outperforms SpiderBoost on various tasks including image classification, natural language processing, and sparse matrix factorization.
arXiv Detail & Related papers (2020-01-27T08:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.