PHYDI: Initializing Parameterized Hypercomplex Neural Networks as
Identity Functions
- URL: http://arxiv.org/abs/2310.07612v1
- Date: Wed, 11 Oct 2023 15:56:55 GMT
- Title: PHYDI: Initializing Parameterized Hypercomplex Neural Networks as
Identity Functions
- Authors: Matteo Mancanelli, Eleonora Grassucci, Aurelio Uncini, and Danilo
Comminiello
- Abstract summary: parameterized hypercomplex neural networks (PHNNs) are growing in size and no techniques have been adopted so far to control their convergence at a large scale.
In this paper, we study PHNNs convergence and propose a method to improve their convergence at different scales.
We show the effectiveness of this approach in different benchmarks and with common PHNNs with ResNets- and Transformer-based architecture.
- Score: 9.836302410524842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural models based on hypercomplex algebra systems are growing and
prolificating for a plethora of applications, ranging from computer vision to
natural language processing. Hand in hand with their adoption, parameterized
hypercomplex neural networks (PHNNs) are growing in size and no techniques have
been adopted so far to control their convergence at a large scale. In this
paper, we study PHNNs convergence and propose parameterized hypercomplex
identity initialization (PHYDI), a method to improve their convergence at
different scales, leading to more robust performance when the number of layers
scales up, while also reaching the same performance with fewer iterations. We
show the effectiveness of this approach in different benchmarks and with common
PHNNs with ResNets- and Transformer-based architecture. The code is available
at https://github.com/ispamm/PHYDI.
Related papers
- Geometry Aware Meta-Learning Neural Network for Joint Phase and Precoder Optimization in RIS [9.20186865054847]
We propose a complex-valued, geometry aware meta-learning neural network that maximizes the weighted sum rate in a multi-user multiple input single output system.
We use a complex-valued neural network for phase shifts and an Euler inspired update for the precoder network.
Our approach outperforms existing neural network-based algorithms, offering higher weighted sum rates, lower power consumption, and significantly faster convergence.
arXiv Detail & Related papers (2024-09-17T15:20:23Z) - Towards Explaining Hypercomplex Neural Networks [6.543091030789653]
Hypercomplex neural networks are gaining increasing interest in the deep learning community.
In this paper, we propose inherently interpretable PHNNs and quaternion-like networks.
We draw insights into how this unique branch of neural models operates.
arXiv Detail & Related papers (2024-03-26T17:58:07Z) - Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - SpikingJelly: An open-source machine learning infrastructure platform
for spike-based intelligence [51.6943465041708]
Spiking neural networks (SNNs) aim to realize brain-inspired intelligence on neuromorphic chips with high energy efficiency.
We contribute a full-stack toolkit for pre-processing neuromorphic datasets, building deep SNNs, optimizing their parameters, and deploying SNNs on neuromorphic chips.
arXiv Detail & Related papers (2023-10-25T13:15:17Z) - Convergence and scaling of Boolean-weight optimization for hardware
reservoirs [0.0]
We analytically derive the scaling laws for highly efficient Coordinate Descent applied to optimize the readout layer of a random recurrently connection neural network.
Our results perfectly reproduce the convergence and scaling of a large-scale photonic reservoir implemented in a proof-of-concept experiment.
arXiv Detail & Related papers (2023-05-13T12:15:25Z) - Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks.
We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order.
In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z) - NAR-Former: Neural Architecture Representation Learning towards Holistic
Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically.
Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z) - Auto-PINN: Understanding and Optimizing Physics-Informed Neural
Architecture [77.59766598165551]
Physics-informed neural networks (PINNs) are revolutionizing science and engineering practice by bringing together the power of deep learning to bear on scientific computation.
Here, we propose Auto-PINN, which employs Neural Architecture Search (NAS) techniques to PINN design.
A comprehensive set of pre-experiments using standard PDE benchmarks allows us to probe the structure-performance relationship in PINNs.
arXiv Detail & Related papers (2022-05-27T03:24:31Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Differentiable Neural Architecture Learning for Efficient Neural Network
Design [31.23038136038325]
We introduce a novel emph architecture parameterisation based on scaled sigmoid function.
We then propose a general emphiable Neural Architecture Learning (DNAL) method to optimize the neural architecture without the need to evaluate candidate neural networks.
arXiv Detail & Related papers (2021-03-03T02:03:08Z) - Hyperbolic Neural Networks++ [66.16106727715061]
We generalize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincar'e ball model.
Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts.
arXiv Detail & Related papers (2020-06-15T08:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.