Related papers: Rotate the ReLU to implicitly sparsify deep networks

Rotate the ReLU to implicitly sparsify deep networks

URL: http://arxiv.org/abs/2206.00488v1
Date: Wed, 1 Jun 2022 13:38:45 GMT
Title: Rotate the ReLU to implicitly sparsify deep networks
Authors: Nancy Nayak, Sheetal Kalyani
Abstract summary: We propose a novel idea of rotating the ReLU activation to give one more degree of freedom to the architecture. We show that this activation wherein the rotation is learned via training results in the elimination of those parameters/filters in the network which are not important for the task.
Score: 13.203765985718201
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the era of Deep Neural Network based solutions for a variety of real-life tasks, having a compact and energy-efficient deployable model has become fairly important. Most of the existing deep architectures use Rectifier Linear Unit (ReLU) activation. In this paper, we propose a novel idea of rotating the ReLU activation to give one more degree of freedom to the architecture. We show that this activation wherein the rotation is learned via training results in the elimination of those parameters/filters in the network which are not important for the task. In other words, rotated ReLU seems to be doing implicit sparsification. The slopes of the rotated ReLU activations act as coarse feature extractors and unnecessary features can be eliminated before retraining. Our studies indicate that features always choose to pass through a lesser number of filters in architectures such as ResNet and its variants. Hence, by rotating the ReLU, the weights or the filters that are not necessary are automatically identified and can be dropped thus giving rise to significant savings in memory and computation. Furthermore, in some cases, we also notice that along with saving in memory and computation we also obtain improvement over the reported performance of the corresponding baseline work in the popular datasets such as MNIST, CIFAR-10, CIFAR-100, and SVHN.

Related papers

Improved Vessel Segmentation with Symmetric Rotation-Equivariant U-Net [4.365790707793824]
We propose an efficient symmetric rotation-equivariant (SRE) convolutional kernel implementation to the U-Net architecture. We validate the effectiveness of our method through improved segmentation performance on retina vessel fundus imaging. Our proposed SRE U-Net not only significantly surpasses standard U-Net in handling rotated images, but also outperforms existing equivariant learning methods.
arXiv Detail & Related papers (2025-01-24T15:54:51Z)
Leaky ReLUs That Differ in Forward and Backward Pass Facilitate Activation Maximization in Deep Neural Networks [0.022344294014777957]
Activation (AM) strives to generate optimal input, revealing features that trigger high responses in trained deep neural networks. We show that AM fails to produce optimal input for simple functions containing ReLUs or Leaky ReLUs. We propose a solution based on using Leaky ReLUs with a high negative slope in the backward pass while keeping the original, usually zero, slope in the forward pass.
arXiv Detail & Related papers (2024-10-22T12:38:39Z)
REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints [2.9209462960232235]
State-of-the-art machine learning pipelines generate resource-agnostic models, not capable to adapt at runtime. We introduce Resource-Efficient Deep Subnetworks (REDS) to tackle model adaptation to variable resources. We provide a theoretical result and empirical evidence for REDS outstanding performance in terms of submodels' test set accuracy.
arXiv Detail & Related papers (2023-11-22T12:34:51Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network [3.54359747576165]
RevSilo is the first reversible multi-scale feature fusion module. We create RevBiFPN, a fully reversible bidirectional feature pyramid network. RevBiFPN provides up to a 2.5% boost in AP over HRNet using fewer MACs and a 2.4x reduction in training-time memory.
arXiv Detail & Related papers (2022-06-28T15:48:05Z)
Structured Sparsity Learning for Efficient Video Super-Resolution [99.1632164448236]
We develop a structured pruning scheme called Structured Sparsity Learning (SSL) according to the properties of video super-resolution (VSR) models. In SSL, we design pruning schemes for several key components in VSR models, including residual blocks, recurrent networks, and upsampling networks.
arXiv Detail & Related papers (2022-06-15T17:36:04Z)
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers [83.74380713308605]
We develop a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs. We show in experiments that our method, which introduces negligible extra computational cost, validation accuracies with deep vanilla networks that are competitive with ResNets.
arXiv Detail & Related papers (2022-03-15T17:49:08Z)
Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient [62.660451283548724]
ResiNet is a reinforcement learning framework to discover resilient network topologies against various disasters and attacks. We show that ResiNet achieves a near-optimal resilience gain on multiple graphs while balancing the utility, with a large margin compared to existing approaches.
arXiv Detail & Related papers (2021-10-18T06:14:28Z)
Learning specialized activation functions with the Piecewise Linear Unit [7.820667552233989]
We propose a new activation function called Piecewise Linear Unit(PWLU), which incorporates a carefully designed formulation and learning method. It can learn specialized activation functions and achieves SOTA performance on large-scale datasets like ImageNet and COCO. PWLU is also easy to implement and efficient at inference, which can be widely applied in real-world applications.
arXiv Detail & Related papers (2021-04-08T11:29:11Z)
Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER) SEER is a simple modification of existing off-policy deep reinforcement learning methods. We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z)
GhostSR: Learning Ghost Features for Efficient Image Super-Resolution [49.393251361038025]
Single image super-resolution (SISR) system based on convolutional neural networks (CNNs) achieves fancy performance while requires huge computational costs. We propose to use shift operation to generate the redundant features (i.e., Ghost features) of SISR models. We show that both the non-compact and lightweight SISR models embedded in our proposed module can achieve comparable performance to that of their baselines.
arXiv Detail & Related papers (2021-01-21T10:09:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.