Perceptual Gradient Networks
- URL: http://arxiv.org/abs/2105.01957v1
- Date: Wed, 5 May 2021 09:58:22 GMT
- Title: Perceptual Gradient Networks
- Authors: Dmitry Nikulin, Roman Suvorov, Aleksei Ivakhnenko, Victor Lempitsky
- Abstract summary: We propose a way to train generator networks using approximations of perceptual loss that are computed without forward-backward passes.
We use a simpler perceptual gradient network that directly synthesizes the gradient field of a perceptual loss.
- Score: 4.897538221461331
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many applications of deep learning for image generation use perceptual losses
for either training or fine-tuning of the generator networks. The use of
perceptual loss however incurs repeated forward-backward passes in a large
image classification network as well as a considerable memory overhead required
to store the activations of this network. It is therefore desirable or
sometimes even critical to get rid of these overheads.
In this work, we propose a way to train generator networks using
approximations of perceptual loss that are computed without forward-backward
passes. Instead, we use a simpler perceptual gradient network that directly
synthesizes the gradient field of a perceptual loss. We introduce the concept
of proxy targets, which stabilize the predicted gradient, meaning that learning
with it does not lead to divergence or oscillations. In addition, our method
allows interpretation of the predicted gradient, providing insight into the
internals of perceptual loss and suggesting potential ways to improve it in
future work.
Related papers
- Contrastive Forward-Forward: A Training Algorithm of Vision Transformer [1.6574413179773757]
Forward-Forward is a new training algorithm that is more similar to what occurs in the brain.
In this work, we have extended the use of this algorithm to a more complex and modern network, namely the Vision Transformer.
Our proposed algorithm performs significantly better than the baseline Forward-Forward leading to an increase of up to 10% in accuracy and boosting the convergence speed by 5 to 20 times on Vision Transformer.
arXiv Detail & Related papers (2025-02-01T21:41:59Z) - Take A Shortcut Back: Mitigating the Gradient Vanishing for Training Spiking Neural Networks [15.691263438655842]
Spiking Neural Network (SNN) is a biologically inspired neural network infrastructure that has recently garnered significant attention.
Training an SNN directly poses a challenge due to the undefined gradient of the firing spike process.
We propose a shortcut back-propagation method in our paper, which advocates for transmitting the gradient directly from the loss to the shallow layers.
arXiv Detail & Related papers (2024-01-09T10:54:41Z) - Rethinking PGD Attack: Is Sign Function Necessary? [131.6894310945647]
We present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance.
We propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign.
The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments.
arXiv Detail & Related papers (2023-12-03T02:26:58Z) - Can Forward Gradient Match Backpropagation? [2.875726839945885]
Forward Gradients have been shown to be utilizable for neural network training.
We propose to strongly bias our gradient guesses in directions that are much more promising, such as feedback obtained from small, local auxiliary networks.
We find that using gradients obtained from a local loss as a candidate direction drastically improves on random noise in Forward Gradient methods.
arXiv Detail & Related papers (2023-06-12T08:53:41Z) - Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Low-memory stochastic backpropagation with multi-channel randomized
trace estimation [6.985273194899884]
We propose to approximate the gradient of convolutional layers in neural networks with a multi-channel randomized trace estimation technique.
Compared to other methods, this approach is simple, amenable to analyses, and leads to a greatly reduced memory footprint.
We discuss the performance of networks trained with backpropagation and how the error can be controlled while maximizing memory usage and minimizing computational overhead.
arXiv Detail & Related papers (2021-06-13T13:54:02Z) - Implicit Under-Parameterization Inhibits Data-Efficient Deep
Reinforcement Learning [97.28695683236981]
More gradient updates decrease the expressivity of the current value network.
We demonstrate this phenomenon on Atari and Gym benchmarks, in both offline and online RL settings.
arXiv Detail & Related papers (2020-10-27T17:55:16Z) - Deep Networks from the Principle of Rate Reduction [32.87280757001462]
This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification.
We show that the basic iterative ascent gradient scheme for optimizing the rate reduction of learned features naturally leads to a multi-layer deep network, one iteration per layer.
All components of this "white box" network have precise optimization, statistical, and geometric interpretation.
arXiv Detail & Related papers (2020-10-27T06:01:43Z) - Boosting Gradient for White-Box Adversarial Attacks [60.422511092730026]
We propose a universal adversarial example generation method, called ADV-ReLU, to enhance the performance of gradient based white-box attack algorithms.
Our approach calculates the gradient of the loss function versus network input, maps the values to scores, and selects a part of them to update the misleading gradients.
arXiv Detail & Related papers (2020-10-21T02:13:26Z) - The Break-Even Point on Optimization Trajectories of Deep Neural
Networks [64.7563588124004]
We argue for the existence of the "break-even" point on this trajectory.
We show that using a large learning rate in the initial phase of training reduces the variance of the gradient.
We also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers.
arXiv Detail & Related papers (2020-02-21T22:55:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.