PHYDI: Initializing Parameterized Hypercomplex Neural Networks as
Identity Functions
- URL: http://arxiv.org/abs/2310.07612v1
- Date: Wed, 11 Oct 2023 15:56:55 GMT
- Title: PHYDI: Initializing Parameterized Hypercomplex Neural Networks as
Identity Functions
- Authors: Matteo Mancanelli, Eleonora Grassucci, Aurelio Uncini, and Danilo
Comminiello
- Abstract summary: parameterized hypercomplex neural networks (PHNNs) are growing in size and no techniques have been adopted so far to control their convergence at a large scale.
In this paper, we study PHNNs convergence and propose a method to improve their convergence at different scales.
We show the effectiveness of this approach in different benchmarks and with common PHNNs with ResNets- and Transformer-based architecture.
- Score: 9.836302410524842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural models based on hypercomplex algebra systems are growing and
prolificating for a plethora of applications, ranging from computer vision to
natural language processing. Hand in hand with their adoption, parameterized
hypercomplex neural networks (PHNNs) are growing in size and no techniques have
been adopted so far to control their convergence at a large scale. In this
paper, we study PHNNs convergence and propose parameterized hypercomplex
identity initialization (PHYDI), a method to improve their convergence at
different scales, leading to more robust performance when the number of layers
scales up, while also reaching the same performance with fewer iterations. We
show the effectiveness of this approach in different benchmarks and with common
PHNNs with ResNets- and Transformer-based architecture. The code is available
at https://github.com/ispamm/PHYDI.
Related papers
- Trainable Adaptive Activation Function Structure (TAAFS) Enhances Neural Network Force Field Performance with Only Dozens of Additional Parameters [0.0]
Trainable Adaptive Function Activation Structure (TAAFS)
We introduce a method that selects distinct mathematical formulations for non-linear activations.
In this study, we integrate TAAFS into a variety of neural network models, resulting in observed accuracy improvements.
arXiv Detail & Related papers (2024-12-19T09:06:39Z) - Deep-Unrolling Multidimensional Harmonic Retrieval Algorithms on Neuromorphic Hardware [78.17783007774295]
This paper explores the potential of conversion-based neuromorphic algorithms for highly accurate and energy-efficient single-snapshot multidimensional harmonic retrieval.
A novel method for converting the complex-valued convolutional layers and activations into spiking neural networks (SNNs) is developed.
The converted SNNs achieve almost five-fold power efficiency at moderate performance loss compared to the original CNNs.
arXiv Detail & Related papers (2024-12-05T09:41:33Z) - Geometry Aware Meta-Learning Neural Network for Joint Phase and Precoder Optimization in RIS [9.20186865054847]
We propose a complex-valued, geometry aware meta-learning neural network that maximizes the weighted sum rate in a multi-user multiple input single output system.
We use a complex-valued neural network for phase shifts and an Euler inspired update for the precoder network.
Our approach outperforms existing neural network-based algorithms, offering higher weighted sum rates, lower power consumption, and significantly faster convergence.
arXiv Detail & Related papers (2024-09-17T15:20:23Z) - Towards Explaining Hypercomplex Neural Networks [6.543091030789653]
Hypercomplex neural networks are gaining increasing interest in the deep learning community.
In this paper, we propose inherently interpretable PHNNs and quaternion-like networks.
We draw insights into how this unique branch of neural models operates.
arXiv Detail & Related papers (2024-03-26T17:58:07Z) - Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - SpikingJelly: An open-source machine learning infrastructure platform
for spike-based intelligence [51.6943465041708]
Spiking neural networks (SNNs) aim to realize brain-inspired intelligence on neuromorphic chips with high energy efficiency.
We contribute a full-stack toolkit for pre-processing neuromorphic datasets, building deep SNNs, optimizing their parameters, and deploying SNNs on neuromorphic chips.
arXiv Detail & Related papers (2023-10-25T13:15:17Z) - NAR-Former: Neural Architecture Representation Learning towards Holistic
Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically.
Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z) - Auto-PINN: Understanding and Optimizing Physics-Informed Neural
Architecture [77.59766598165551]
Physics-informed neural networks (PINNs) are revolutionizing science and engineering practice by bringing together the power of deep learning to bear on scientific computation.
Here, we propose Auto-PINN, which employs Neural Architecture Search (NAS) techniques to PINN design.
A comprehensive set of pre-experiments using standard PDE benchmarks allows us to probe the structure-performance relationship in PINNs.
arXiv Detail & Related papers (2022-05-27T03:24:31Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Differentiable Neural Architecture Learning for Efficient Neural Network
Design [31.23038136038325]
We introduce a novel emph architecture parameterisation based on scaled sigmoid function.
We then propose a general emphiable Neural Architecture Learning (DNAL) method to optimize the neural architecture without the need to evaluate candidate neural networks.
arXiv Detail & Related papers (2021-03-03T02:03:08Z) - Hyperbolic Neural Networks++ [66.16106727715061]
We generalize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincar'e ball model.
Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts.
arXiv Detail & Related papers (2020-06-15T08:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.