Related papers: The Resurrection of the ReLU

The Resurrection of the ReLU

URL: http://arxiv.org/abs/2505.22074v1
Date: Wed, 28 May 2025 07:55:51 GMT
Title: The Resurrection of the ReLU
Authors: Coşku Can Horuz, Geoffrey Kasenbacher, Saya Higuchi, Sebastian Kairat, Jendrik Stoltz, Moritz Pesl, Bernhard A. Moser, Christoph Linse, Thomas Martinetz, Sebastian Otte,
Abstract summary: We introduce surrogate gradient learning for ReLU (SUGAR) as a novel, plug-and-play regularizer for deep architectures.<n>SUGAR preserves the standard ReLU function during the forward pass but replaces its derivative in the backward pass with a smooth surrogate.<n>We demonstrate that SUGAR, when paired with a well-chosen surrogate function, substantially enhances performance over convolutional network architectures.
Score: 1.0626574691596062
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modeling sophisticated activation functions within deep learning architectures has evolved into a distinct research direction. Functions such as GELU, SELU, and SiLU offer smooth gradients and improved convergence properties, making them popular choices in state-of-the-art models. Despite this trend, the classical ReLU remains appealing due to its simplicity, inherent sparsity, and other advantageous topological characteristics. However, ReLU units are prone to becoming irreversibly inactive - a phenomenon known as the dying ReLU problem - which limits their overall effectiveness. In this work, we introduce surrogate gradient learning for ReLU (SUGAR) as a novel, plug-and-play regularizer for deep architectures. SUGAR preserves the standard ReLU function during the forward pass but replaces its derivative in the backward pass with a smooth surrogate that avoids zeroing out gradients. We demonstrate that SUGAR, when paired with a well-chosen surrogate function, substantially enhances generalization performance over convolutional network architectures such as VGG-16 and ResNet-18, providing sparser activations while effectively resurrecting dead ReLUs. Moreover, we show that even in modern architectures like Conv2NeXt and Swin Transformer - which typically employ GELU - substituting these with SUGAR yields competitive and even slightly superior performance. These findings challenge the prevailing notion that advanced activation functions are necessary for optimal performance. Instead, they suggest that the conventional ReLU, particularly with appropriate gradient handling, can serve as a strong, versatile revived classic across a broad range of deep learning vision models.

Related papers

Gradient as Conditions: Rethinking HOG for All-in-one Image Restoration [23.153283910821862]
Histogram of Oriented Gradients (HOG) as a classical gradient representation has strong discriminative capability across diverse degradations.<n>We propose HOGformer, a Transformer-based model that integrates learnable HOG features for degradation-aware restoration.<n> HOGformer achieves state-of-the-art performance and generalizes well to complex real-world scenarios.
arXiv Detail & Related papers (2025-04-12T23:52:59Z)
InvFussion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems [76.39776789410088]
This work introduces a framework that combines the strong performance of supervised approaches and the flexibility of zero-shot methods.<n>A novel architectural design seamlessly integrates the degradation operator directly into the denoiser.<n> Experimental results on the FFHQ and ImageNet datasets demonstrate state-of-the-art posterior-sampling performance.
arXiv Detail & Related papers (2025-04-02T12:40:57Z)
Hysteresis Activation Function for Efficient Inference [3.5223695602582614]
We propose a Hysteresis Rectified Linear Unit (HeLU) to address the dying ReLU'' problem with minimal complexity.<n>Unlike traditional activation functions with fixed thresholds for training and inference, HeLU employs a variable threshold that refines the backpropagation.
arXiv Detail & Related papers (2024-11-15T20:46:58Z)
DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution [48.744290794713905]
In real-world scenarios, captured depth data often suffer from unconventional and unknown degradation due to sensor limitations and complex imaging environments.<n>We propose the Degradation Oriented and Regularized Network (DORNet), a novel framework designed to adaptively address unknown degradation in real-world scenes.<n>Our approach begins with the development of a self-supervised degradation learning strategy, which models the degradation representations of low-resolution depth data.<n>To facilitate effective RGB-D fusion, we further introduce a degradation-oriented feature transformation module that selectively propagates RGB content into the depth data based on the learned degradation priors.
arXiv Detail & Related papers (2024-10-15T14:53:07Z)
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models [3.7802450241986945]
LayerNorm is a critical component in modern large language models (LLMs) for stabilizing training and ensuring smooth optimization. This work explores desirable activation functions in normalization-free decoder-only LLMs. ReLU significantly outperforms GELU in LayerNorm-free models, leading to an bf 8.2% perplexity improvement.
arXiv Detail & Related papers (2024-10-12T20:26:01Z)
Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning [55.5715496559514]
LoRA Slow Cascade Learning (LoRASC) is an innovative technique designed to enhance LoRA's expressiveness and generalization capabilities. Our approach augments expressiveness through a cascaded learning strategy that enables a mixture-of-low-rank adaptation, thereby increasing the model's ability to capture complex patterns.
arXiv Detail & Related papers (2024-07-01T17:28:59Z)
GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy. Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge. We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z)
RRSR:Reciprocal Reference-based Image Super-Resolution with Progressive Feature Alignment and Selection [66.08293086254851]
We propose a reciprocal learning framework to reinforce the learning of a RefSR network. The newly proposed module aligns reference-input images at multi-scale feature spaces and performs reference-aware feature selection. We empirically show that multiple recent state-of-the-art RefSR models can be consistently improved with our reciprocal learning paradigm.
arXiv Detail & Related papers (2022-11-08T12:39:35Z)
FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories. We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z)
ALReLU: A different approach on Leaky ReLU activation function to improve Neural Networks Performance [0.0]
The classical ReLU activation function (AF) has been extensively applied in Deep Neural Networks (DNN) The common gradient issues of ReLU pose challenges in applications on academy and industry sectors. The Absolute Leaky ReLU (ALReLU) AF, a variation of LReLU, is proposed as an alternative method to resolve the common 'dying ReLU problem'
arXiv Detail & Related papers (2020-12-11T06:46:42Z)
The effect of Target Normalization and Momentum on Dying ReLU [22.41606885255209]
We show that unit variance targets are well motivated and that ReLUs die more easily, when target variance approaches zero. We also analyze the gradients of a single-ReLU model to identify saddle points and regions corresponding to dying ReLU.
arXiv Detail & Related papers (2020-05-13T08:01:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.