ULU: A Unified Activation Function
- URL: http://arxiv.org/abs/2508.05073v1
- Date: Thu, 07 Aug 2025 06:58:22 GMT
- Title: ULU: A Unified Activation Function
- Authors: Simin Huo,
- Abstract summary: ULU treats positive and negative inputs differently.<n>Experiments demonstrate ULU significantly outperforms ReLU and Mish across image classification and object detection tasks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose \textbf{ULU}, a novel non-monotonic, piecewise activation function defined as $\{f(x;\alpha_1),x<0; f(x;\alpha_2),x>=0 \}$, where $f(x;\alpha)=0.5x(tanh(\alpha x)+1),\alpha >0$. ULU treats positive and negative inputs differently. Extensive experiments demonstrate ULU significantly outperforms ReLU and Mish across image classification and object detection tasks. Its variant Adaptive ULU (\textbf{AULU}) is expressed as $\{f(x;\beta_1^2),x<0; f(x;\beta_2^2),x>=0 \}$, where $\beta_1$ and $\beta_2$ are learnable parameters, enabling it to adapt its response separately for positive and negative inputs. Additionally, we introduce the LIB (Like Inductive Bias) metric from AULU to quantitatively measure the inductive bias of the model.
Related papers
- Surrogate to Poincaré inequalities on manifolds for dimension reduction in nonlinear feature spaces [49.1574468325115]
We aim to approximate a continuously differentiable function $u:mathbbRd rightarrow mathbbRm$ by a composition of functions $fcirc g$ where $g:mathbbRd rightarrow mathbbRm$, $mleq d$, and $f : mathbbRm rightarrow mathbbRR$.<n>For a fixed $g$, we build $f$ using classical regression methods, involving evaluations
arXiv Detail & Related papers (2025-05-03T12:37:27Z) - Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics [39.0860823332923]
GoLU is a novel self-gated activation function defined as $mathrmGoLU(x) = x, mathrmGompertz(x)$, where $mathrmGompertz(x) = e-e-x$.<n>GoLU is superior to state-of-the-art activation functions, establishing GoLU as a robust alternative to existing activation functions.
arXiv Detail & Related papers (2025-02-05T22:32:22Z) - Deriving Activation Functions Using Integration [8.345753173238956]
We introduce the Expanded Integral of the Exponential Linear Unit (xIELU), a trainable piecewise activation function derived by integrating trainable affine transformations.<n>xIELU combines two key properties for the gradient: (1) a trainable and linearly increasing gradient for positive inputs, similar to Squared ReLU (ReLU$2$), and (2) a trainable gradient that can take negative values for negative inputs, inspired by Expanded SiLU (xSiLU)<n>In experiments with 1.1B and 3B parameter Llama models trained on 125B tokens of FineWeb Edu, xIELU achieves lower
arXiv Detail & Related papers (2024-11-20T03:24:21Z) - IT$^3$: Idempotent Test-Time Training [95.78053599609044]
Deep learning models often struggle when deployed in real-world settings due to distribution shifts between training and test data.<n>We present Idempotent Test-Time Training (IT$3$), a novel approach that enables on-the-fly adaptation to distribution shifts using only the current test instance.<n>Our results suggest that idempotence provides a universal principle for test-time adaptation that generalizes across domains and architectures.
arXiv Detail & Related papers (2024-10-05T15:39:51Z) - TaLU: A Hybrid Activation Function Combining Tanh and Rectified Linear
Unit to Enhance Neural Networks [1.3477333339913569]
TaLU is a modified activation function combining Tanh and ReLU, which mitigates the dying gradient problem of ReLU.
Deep learning model with the proposed activation function was tested on MNIST and CIFAR-10.
arXiv Detail & Related papers (2023-05-08T01:13:59Z) - Layered State Discovery for Incremental Autonomous Exploration [106.37656068276901]
Layered Autonomous Exploration (LAE) is a novel algorithm for AX that attains a sample complexity of $tildemathcalO(LSrightarrow_LAln12(Srightarrow_LAln12(Srightarrow_LAln12(Srightarrow_LAln12(Srightar row_LAln12(Srightarrow_LAln12
arXiv Detail & Related papers (2023-02-07T22:58:12Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Feature Cross Search via Submodular Optimization [58.15569071608769]
We study feature cross search as a fundamental primitive in feature engineering.
We show that there exists a simple greedy $(1-1/e)$-approximation algorithm for this problem.
arXiv Detail & Related papers (2021-07-05T16:58:31Z) - Submodular + Concave [53.208470310734825]
It has been well established that first order optimization methods can converge to the maximal objective value of concave functions.
In this work, we initiate the determinant of the smooth functions convex body $$F(x) = G(x) +C(x)$.
This class of functions is an extension of both concave and continuous DR-submodular functions for which no guarantee is known.
arXiv Detail & Related papers (2021-06-09T01:59:55Z) - Nonparametric Learning of Two-Layer ReLU Residual Units [22.870658194212744]
We describe an algorithm that learns two-layer residual units with rectified linear unit (ReLU) activation.
We design layer-wise objectives as functionals whose analytic minimizers express the exact ground-truth network in terms of its parameters and nonlinearities.
We prove the statistical strong consistency of our algorithm, and demonstrate the robustness and sample efficiency of our algorithm by experiments.
arXiv Detail & Related papers (2020-08-17T22:11:26Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - Gaussian Error Linear Units (GELUs) [58.195342948092964]
We propose a neural network activation function that weights inputs by their value, rather than gates by their sign.
We find performance improvements across all considered computer vision, natural language processing, and speech tasks.
arXiv Detail & Related papers (2016-06-27T19:20:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.