Related papers: Improving Quaternion Neural Networks with Quaternionic Activation Functions

Improving Quaternion Neural Networks with Quaternionic Activation Functions

URL: http://arxiv.org/abs/2406.16481v1
Date: Mon, 24 Jun 2024 09:36:58 GMT
Title: Improving Quaternion Neural Networks with Quaternionic Activation Functions
Authors: Johannes Pöppelbaum, Andreas Schwung,
Abstract summary: We propose novel quaternion activation functions where we modify either the quaternion magnitude or the phase. The proposed activation functions can be incorporated in arbitrary quaternion valued neural networks trained with gradient descent techniques.
Score: 3.8750364147156247
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In this paper, we propose novel quaternion activation functions where we modify either the quaternion magnitude or the phase, as an alternative to the commonly used split activation functions. We define criteria that are relevant for quaternion activation functions, and subsequently we propose our novel activation functions based on this analysis. Instead of applying a known activation function like the ReLU or Tanh on the quaternion elements separately, these activation functions consider the quaternion properties and respect the quaternion space $\mathbb{H}$. In particular, all quaternion components are utilized to calculate all output components, carrying out the benefit of the Hamilton product in e.g. the quaternion convolution to the activation functions. The proposed activation functions can be incorporated in arbitrary quaternion valued neural networks trained with gradient descent techniques. We further discuss the derivatives of the proposed activation functions where we observe beneficial properties for the activation functions affecting the phase. Specifically, they prove to be sensitive on basically the whole input range, thus improved gradient flow can be expected. We provide an elaborate experimental evaluation of our proposed quaternion activation functions including comparison with the split ReLU and split Tanh on two image classification tasks using the CIFAR-10 and SVHN dataset. There, especially the quaternion activation functions affecting the phase consistently prove to provide better performance.

Related papers

Provable In-Context Learning of Nonlinear Regression with Transformers [58.018629320233174]
In-context learning (ICL) is the ability to perform unseen tasks using task-specific prompts without updating parameters.<n>Recent research has actively explored the training dynamics behind ICL.<n>This paper investigates more complex nonlinear regression tasks, aiming to uncover how transformers acquire in-context learning capabilities.
arXiv Detail & Related papers (2025-07-28T00:09:28Z)
Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning [0.0]
Activation functions (AFs) are crucial components of deep neural networks (DNNs) We propose a novel hybrid activation function designed to combine the strengths of both the Tanh and Leaky ReLU activation functions. PLanh is differentiable at all points and addresses the 'dying ReLU' problem by ensuring a non-zero gradient for negative inputs.
arXiv Detail & Related papers (2023-08-11T08:59:27Z)
STL: A Signed and Truncated Logarithm Activation Function for Neural Networks [5.9622541907827875]
Activation functions play an essential role in neural networks. We present a novel signed and truncated logarithm function as activation function. The suggested activation function can be applied in a large range of neural networks.
arXiv Detail & Related papers (2023-07-31T03:41:14Z)
LayerAct: Advanced activation mechanism utilizing layer-direction normalization for CNNs with BatchNorm [3.413632819633068]
LayerAct functions are designed to be more noise-robust compared to existing element-level activation functions. We show that LayerAct functions exhibit superior noise-robustness compared to element-level activation functions.
arXiv Detail & Related papers (2023-06-08T05:13:34Z)
Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z)
Special functions in quantum phase estimation [61.12008553173672]
We focus on two special functions. One is prolate spheroidal wave function, which approximately gives the maximum probability that the difference between the true parameter and the estimate is smaller than a certain threshold. The other is Mathieu function, which exactly gives the optimum estimation under the energy constraint.
arXiv Detail & Related papers (2023-02-14T08:33:24Z)
Nish: A Novel Negative Stimulated Hybrid Activation Function [5.482532589225552]
We propose a novel non-monotonic activation function called Negative Stimulated Hybrid Activation Function (Nish) It behaves like a Rectified Linear Unit (ReLU) function for values greater than zero, and a sinus-sigmoidal function for values less than zero. The proposed function incorporates the sigmoid and sine wave, allowing new dynamics over traditional ReLU activations.
arXiv Detail & Related papers (2022-10-17T13:32:52Z)
Transformers with Learnable Activation Functions [63.98696070245065]
We use Rational Activation Function (RAF) to learn optimal activation functions during training according to input data. RAF opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.
arXiv Detail & Related papers (2022-08-30T09:47:31Z)
Gate-based spin readout of hole quantum dots with site-dependent $g-$factors [101.23523361398418]
We experimentally investigate a hole double quantum dot in silicon by carrying out spin readout with gate-based reflectometry. We show that characteristic features in the reflected phase signal arising from magneto-spectroscopy convey information on site-dependent $g-$factors in the two dots.
arXiv Detail & Related papers (2022-06-27T09:07:20Z)
Exploring Linear Feature Disentanglement For Neural Networks [63.20827189693117]
Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs) Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space. This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
arXiv Detail & Related papers (2022-03-22T13:09:17Z)
Activation Functions: Dive into an optimal activation function [1.52292571922932]
We find an optimal activation function by defining it as a weighted sum of existing activation functions. The study uses three activation functions, ReLU, tanh, and sin, over three popular image datasets.
arXiv Detail & Related papers (2022-02-24T12:44:11Z)
Effect of the output activation function on the probabilities and errors in medical image segmentation [3.0625089376654664]
sigmoid activation is the standard output activation function in binary classification and segmentation with neural networks. We consider how the behavior of different output activation and loss functions affects the prediction probabilities and the corresponding segmentation errors.
arXiv Detail & Related papers (2021-09-02T12:51:14Z)
An Investigation of Potential Function Designs for Neural CRF [75.79555356970344]
In this paper, we investigate a series of increasingly expressive potential functions for neural CRF models. Our experiments show that the decomposed quadrilinear potential function based on the vector representations of two neighboring labels and two neighboring words consistently achieves the best performance.
arXiv Detail & Related papers (2020-11-11T07:32:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.