Learnable polynomial, trigonometric, and tropical activations
- URL: http://arxiv.org/abs/2502.01247v1
- Date: Mon, 03 Feb 2025 11:13:58 GMT
- Title: Learnable polynomial, trigonometric, and tropical activations
- Authors: Ismail Khalfaoui-Hassani, Stefan Kesselheim,
- Abstract summary: This paper investigates scalable neural networks with learnable activation functions based on function bases and tropicals.<n>We propose a scheme that preserves unitary variance in transformers and convolutional networks, ensuring stable gradient flow even in deep architectures.<n>Experiments demonstrate that networks with Hermite, Fourier, and Tropical-based learnable activations significantly improve over GPT-2 and ConvNeXt networks in terms of accuracy and perplexity in train and test.
- Score: 1.534667887016089
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates scalable neural networks with learnable activation functions based on orthogonal function bases and tropical polynomials, targeting ImageNet-1K classification and next token prediction on OpenWebText. Traditional activations, such as ReLU, are static. In contrast, learnable activations enable the network to adapt dynamically during training. However, stability issues, such as vanishing or exploding gradients, arise with improper variance management in deeper networks. To remedy this, we propose an initialization scheme that single-handedly preserves unitary variance in transformers and convolutional networks, ensuring stable gradient flow even in deep architectures. Extensive experiments demonstrate that networks with Hermite, Fourier, and Tropical-based learnable activations significantly improve over GPT-2 and ConvNeXt networks in terms of accuracy and perplexity in train and test, highlighting the viability of learnable activations in large-scale tasks. The activation functions developed here are the subject of a library coded entirely in pure PyTorch: torchortho, available at https://github.com/K-H-Ismail/torchortho.
Related papers
- Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning [78.88684753303794]
Deep learning has predominantly advanced through applications in computer vision and natural language processing.<n>Neural operators are a principled way to generalize neural networks to mappings between function spaces.<n>This paper identifies and distills the key principles for constructing practical implementations of mappings between infinite-dimensional function spaces.
arXiv Detail & Related papers (2025-06-12T17:59:31Z) - Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning [4.051777802443125]
Sparse Autoencoders (SAEs) are a promising approach for extracting neural network representations.
We introduce Gradient SAEs, which modify the $k$-sparse autoencoder architecture by augmenting the TopK activation function.
We find evidence that g-SAEs learn latents that are on average more effective at steering models in arbitrary contexts.
arXiv Detail & Related papers (2024-11-15T18:03:52Z) - Multilinear Operator Networks [60.7432588386185]
Polynomial Networks is a class of models that does not require activation functions.
We propose MONet, which relies solely on multilinear operators.
arXiv Detail & Related papers (2024-01-31T16:52:19Z) - Regularization of polynomial networks for image recognition [78.4786845859205]
Polynomial Networks (PNs) have emerged as an alternative method with a promising performance and improved interpretability.
We introduce a class of PNs, which are able to reach the performance of ResNet across a range of six benchmarks.
arXiv Detail & Related papers (2023-03-24T10:05:22Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks.
We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order.
In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z) - Unification of popular artificial neural network activation functions [0.0]
We present a unified representation of the most popular neural network activation functions.
Adopting Mittag-Leffler functions of fractional calculus, we propose a flexible and compact functional form.
arXiv Detail & Related papers (2023-02-21T21:20:59Z) - Simple initialization and parametrization of sinusoidal networks via
their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions.
We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis.
We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z) - Dynamics-aware Adversarial Attack of Adaptive Neural Networks [75.50214601278455]
We investigate the dynamics-aware adversarial attack problem of adaptive neural networks.
We propose a Leaded Gradient Method (LGM) and show the significant effects of the lagged gradient.
Our LGM achieves impressive adversarial attack performance compared with the dynamic-unaware attack methods.
arXiv Detail & Related papers (2022-10-15T01:32:08Z) - Rapid training of deep neural networks without skip connections or
normalization layers using Deep Kernel Shaping [46.083745557823164]
We identify the main pathologies present in deep networks that prevent them from training fast and generalizing to unseen data.
We show how these can be avoided by carefully controlling the "shape" of the network's kernel function.
arXiv Detail & Related papers (2021-10-05T00:49:36Z) - Ladder Polynomial Neural Networks [6.902168821854859]
Polynomial functions have plenty of useful analytical properties, but they are rarely used as learning models because their function class is considered to be restricted.
This work constructs feedforward neural networks using the product activation, a new activation function constructed from multiplications.
arXiv Detail & Related papers (2021-06-25T18:16:48Z) - Learning specialized activation functions with the Piecewise Linear Unit [7.820667552233989]
We propose a new activation function called Piecewise Linear Unit(PWLU), which incorporates a carefully designed formulation and learning method.
It can learn specialized activation functions and achieves SOTA performance on large-scale datasets like ImageNet and COCO.
PWLU is also easy to implement and efficient at inference, which can be widely applied in real-world applications.
arXiv Detail & Related papers (2021-04-08T11:29:11Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - A Use of Even Activation Functions in Neural Networks [0.35172332086962865]
We propose an alternative approach to integrate existing knowledge or hypotheses of data structure by constructing custom activation functions.
We show that using an even activation function in one of the fully connected layers improves neural network performance.
arXiv Detail & Related papers (2020-11-23T20:33:13Z) - Deep Polynomial Neural Networks [77.70761658507507]
$Pi$Nets are a new class of function approximators based on expansions.
$Pi$Nets produce state-the-art results in three challenging tasks, i.e. image generation, face verification and 3D mesh representation learning.
arXiv Detail & Related papers (2020-06-20T16:23:32Z) - A survey on modern trainable activation functions [0.0]
We propose a taxonomy of trainable activation functions and highlight common and distinctive proprieties of recent and past models.
We show that many of the proposed approaches are equivalent to adding neuron layers which use fixed (non-trainable) activation functions.
arXiv Detail & Related papers (2020-05-02T12:38:43Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Tunable Quantum Neural Networks for Boolean Functions [0.0]
We introduce the idea of a generic quantum circuit whose gates can be tuned to learn any Boolean functions.
In order to perform the learning task, we have devised an algorithm that leverages the absence of measurements.
arXiv Detail & Related papers (2020-03-31T11:55:01Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.