Related papers: PHYDI: Initializing Parameterized Hypercomplex Neural Networks as Identity Functions

PHYDI: Initializing Parameterized Hypercomplex Neural Networks as Identity Functions

URL: http://arxiv.org/abs/2310.07612v1
Date: Wed, 11 Oct 2023 15:56:55 GMT
Title: PHYDI: Initializing Parameterized Hypercomplex Neural Networks as Identity Functions
Authors: Matteo Mancanelli, Eleonora Grassucci, Aurelio Uncini, and Danilo Comminiello
Abstract summary: parameterized hypercomplex neural networks (PHNNs) are growing in size and no techniques have been adopted so far to control their convergence at a large scale. In this paper, we study PHNNs convergence and propose a method to improve their convergence at different scales. We show the effectiveness of this approach in different benchmarks and with common PHNNs with ResNets- and Transformer-based architecture.
Score: 9.836302410524842
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural models based on hypercomplex algebra systems are growing and prolificating for a plethora of applications, ranging from computer vision to natural language processing. Hand in hand with their adoption, parameterized hypercomplex neural networks (PHNNs) are growing in size and no techniques have been adopted so far to control their convergence at a large scale. In this paper, we study PHNNs convergence and propose parameterized hypercomplex identity initialization (PHYDI), a method to improve their convergence at different scales, leading to more robust performance when the number of layers scales up, while also reaching the same performance with fewer iterations. We show the effectiveness of this approach in different benchmarks and with common PHNNs with ResNets- and Transformer-based architecture. The code is available at https://github.com/ispamm/PHYDI.

Related papers

Instruction-Guided Autoregressive Neural Network Parameter Generation [49.800239140036496]
We propose IGPG, an autoregressive framework that unifies parameter synthesis across diverse tasks and architectures. By autoregressively generating neural network weights' tokens, IGPG ensures inter-layer coherence and enables efficient adaptation across models and datasets. Experiments on multiple datasets demonstrate that IGPG consolidates diverse pretrained models into a single, flexible generative framework.
arXiv Detail & Related papers (2025-04-02T05:50:19Z)
Trainable Adaptive Activation Function Structure (TAAFS) Enhances Neural Network Force Field Performance with Only Dozens of Additional Parameters [0.0]
Trainable Adaptive Function Activation Structure (TAAFS) We introduce a method that selects distinct mathematical formulations for non-linear activations. In this study, we integrate TAAFS into a variety of neural network models, resulting in observed accuracy improvements.
arXiv Detail & Related papers (2024-12-19T09:06:39Z)
Deep-Unrolling Multidimensional Harmonic Retrieval Algorithms on Neuromorphic Hardware [78.17783007774295]
This paper explores the potential of conversion-based neuromorphic algorithms for highly accurate and energy-efficient single-snapshot multidimensional harmonic retrieval. A novel method for converting the complex-valued convolutional layers and activations into spiking neural networks (SNNs) is developed. The converted SNNs achieve almost five-fold power efficiency at moderate performance loss compared to the original CNNs.
arXiv Detail & Related papers (2024-12-05T09:41:33Z)
Geometry Aware Meta-Learning Neural Network for Joint Phase and Precoder Optimization in RIS [9.20186865054847]
We propose a complex-valued, geometry aware meta-learning neural network that maximizes the weighted sum rate in a multi-user multiple input single output system. We use a complex-valued neural network for phase shifts and an Euler inspired update for the precoder network. Our approach outperforms existing neural network-based algorithms, offering higher weighted sum rates, lower power consumption, and significantly faster convergence.
arXiv Detail & Related papers (2024-09-17T15:20:23Z)
An Attempt to Devise a Pairwise Ising-Type Maximum Entropy Model Integrated Cost Function for Optimizing SNN Deployment [0.0]
Spiking Neural Networks (SNNs) emulate the spiking behavior of biological neurons and are typically deployed on distributed-memory neuromorphic hardware. We model SNN dynamics using an Ising-type pairwise interaction framework, bridging microscopic neuron interactions with macroscopic network behavior. We evaluate our approach on two SNNs deployed on the sPyNNaker neuromorphic platform.
arXiv Detail & Related papers (2024-07-09T16:33:43Z)
Towards Explaining Hypercomplex Neural Networks [6.543091030789653]
Hypercomplex neural networks are gaining increasing interest in the deep learning community. In this paper, we propose inherently interpretable PHNNs and quaternion-like networks. We draw insights into how this unique branch of neural models operates.
arXiv Detail & Related papers (2024-03-26T17:58:07Z)
Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process. In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture. We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z)
SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence [51.6943465041708]
Spiking neural networks (SNNs) aim to realize brain-inspired intelligence on neuromorphic chips with high energy efficiency. We contribute a full-stack toolkit for pre-processing neuromorphic datasets, building deep SNNs, optimizing their parameters, and deploying SNNs on neuromorphic chips.
arXiv Detail & Related papers (2023-10-25T13:15:17Z)
Convergence and scaling of Boolean-weight optimization for hardware reservoirs [0.0]
We analytically derive the scaling laws for highly efficient Coordinate Descent applied to optimize the readout layer of a random recurrently connection neural network. Our results perfectly reproduce the convergence and scaling of a large-scale photonic reservoir implemented in a proof-of-concept experiment.
arXiv Detail & Related papers (2023-05-13T12:15:25Z)
Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks. We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order. In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z)
NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically. Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z)
Auto-PINN: Understanding and Optimizing Physics-Informed Neural Architecture [77.59766598165551]
Physics-informed neural networks (PINNs) are revolutionizing science and engineering practice by bringing together the power of deep learning to bear on scientific computation. Here, we propose Auto-PINN, which employs Neural Architecture Search (NAS) techniques to PINN design. A comprehensive set of pre-experiments using standard PDE benchmarks allows us to probe the structure-performance relationship in PINNs.
arXiv Detail & Related papers (2022-05-27T03:24:31Z)
Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training. We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z)
Differentiable Neural Architecture Learning for Efficient Neural Network Design [31.23038136038325]
We introduce a novel emph architecture parameterisation based on scaled sigmoid function. We then propose a general emphiable Neural Architecture Learning (DNAL) method to optimize the neural architecture without the need to evaluate candidate neural networks.
arXiv Detail & Related papers (2021-03-03T02:03:08Z)
Hyperbolic Neural Networks++ [66.16106727715061]
We generalize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincar'e ball model. Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts.
arXiv Detail & Related papers (2020-06-15T08:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.