Rotate the ReLU to implicitly sparsify deep networks
- URL: http://arxiv.org/abs/2206.00488v1
- Date: Wed, 1 Jun 2022 13:38:45 GMT
- Title: Rotate the ReLU to implicitly sparsify deep networks
- Authors: Nancy Nayak, Sheetal Kalyani
- Abstract summary: We propose a novel idea of rotating the ReLU activation to give one more degree of freedom to the architecture.
We show that this activation wherein the rotation is learned via training results in the elimination of those parameters/filters in the network which are not important for the task.
- Score: 13.203765985718201
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the era of Deep Neural Network based solutions for a variety of real-life
tasks, having a compact and energy-efficient deployable model has become fairly
important. Most of the existing deep architectures use Rectifier Linear Unit
(ReLU) activation. In this paper, we propose a novel idea of rotating the ReLU
activation to give one more degree of freedom to the architecture. We show that
this activation wherein the rotation is learned via training results in the
elimination of those parameters/filters in the network which are not important
for the task. In other words, rotated ReLU seems to be doing implicit
sparsification. The slopes of the rotated ReLU activations act as coarse
feature extractors and unnecessary features can be eliminated before
retraining. Our studies indicate that features always choose to pass through a
lesser number of filters in architectures such as ResNet and its variants.
Hence, by rotating the ReLU, the weights or the filters that are not necessary
are automatically identified and can be dropped thus giving rise to significant
savings in memory and computation. Furthermore, in some cases, we also notice
that along with saving in memory and computation we also obtain improvement
over the reported performance of the corresponding baseline work in the popular
datasets such as MNIST, CIFAR-10, CIFAR-100, and SVHN.
Related papers
- Improved Vessel Segmentation with Symmetric Rotation-Equivariant U-Net [4.365790707793824]
We propose an efficient symmetric rotation-equivariant (SRE) convolutional kernel implementation to the U-Net architecture.
We validate the effectiveness of our method through improved segmentation performance on retina vessel fundus imaging.
Our proposed SRE U-Net not only significantly surpasses standard U-Net in handling rotated images, but also outperforms existing equivariant learning methods.
arXiv Detail & Related papers (2025-01-24T15:54:51Z) - Leaky ReLUs That Differ in Forward and Backward Pass Facilitate Activation Maximization in Deep Neural Networks [0.022344294014777957]
Activation (AM) strives to generate optimal input, revealing features that trigger high responses in trained deep neural networks.
We show that AM fails to produce optimal input for simple functions containing ReLUs or Leaky ReLUs.
We propose a solution based on using Leaky ReLUs with a high negative slope in the backward pass while keeping the original, usually zero, slope in the forward pass.
arXiv Detail & Related papers (2024-10-22T12:38:39Z) - REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints [2.9209462960232235]
State-of-the-art machine learning pipelines generate resource-agnostic models, not capable to adapt at runtime.
We introduce Resource-Efficient Deep Subnetworks (REDS) to tackle model adaptation to variable resources.
We provide a theoretical result and empirical evidence for REDS outstanding performance in terms of submodels' test set accuracy.
arXiv Detail & Related papers (2023-11-22T12:34:51Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network [3.54359747576165]
RevSilo is the first reversible multi-scale feature fusion module.
We create RevBiFPN, a fully reversible bidirectional feature pyramid network.
RevBiFPN provides up to a 2.5% boost in AP over HRNet using fewer MACs and a 2.4x reduction in training-time memory.
arXiv Detail & Related papers (2022-06-28T15:48:05Z) - Structured Sparsity Learning for Efficient Video Super-Resolution [99.1632164448236]
We develop a structured pruning scheme called Structured Sparsity Learning (SSL) according to the properties of video super-resolution (VSR) models.
In SSL, we design pruning schemes for several key components in VSR models, including residual blocks, recurrent networks, and upsampling networks.
arXiv Detail & Related papers (2022-06-15T17:36:04Z) - Deep Learning without Shortcuts: Shaping the Kernel with Tailored
Rectifiers [83.74380713308605]
We develop a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs.
We show in experiments that our method, which introduces negligible extra computational cost, validation accuracies with deep vanilla networks that are competitive with ResNets.
arXiv Detail & Related papers (2022-03-15T17:49:08Z) - Edge Rewiring Goes Neural: Boosting Network Resilience via Policy
Gradient [62.660451283548724]
ResiNet is a reinforcement learning framework to discover resilient network topologies against various disasters and attacks.
We show that ResiNet achieves a near-optimal resilience gain on multiple graphs while balancing the utility, with a large margin compared to existing approaches.
arXiv Detail & Related papers (2021-10-18T06:14:28Z) - Learning specialized activation functions with the Piecewise Linear Unit [7.820667552233989]
We propose a new activation function called Piecewise Linear Unit(PWLU), which incorporates a carefully designed formulation and learning method.
It can learn specialized activation functions and achieves SOTA performance on large-scale datasets like ImageNet and COCO.
PWLU is also easy to implement and efficient at inference, which can be widely applied in real-world applications.
arXiv Detail & Related papers (2021-04-08T11:29:11Z) - Improving Computational Efficiency in Visual Reinforcement Learning via
Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER)
SEER is a simple modification of existing off-policy deep reinforcement learning methods.
We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z) - GhostSR: Learning Ghost Features for Efficient Image Super-Resolution [49.393251361038025]
Single image super-resolution (SISR) system based on convolutional neural networks (CNNs) achieves fancy performance while requires huge computational costs.
We propose to use shift operation to generate the redundant features (i.e., Ghost features) of SISR models.
We show that both the non-compact and lightweight SISR models embedded in our proposed module can achieve comparable performance to that of their baselines.
arXiv Detail & Related papers (2021-01-21T10:09:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.