Related papers: Equivariant Neural Functional Networks for Transformers

Equivariant Neural Functional Networks for Transformers

URL: http://arxiv.org/abs/2410.04209v1
Date: Sat, 5 Oct 2024 15:56:57 GMT
Title: Equivariant Neural Functional Networks for Transformers
Authors: Viet-Hoang Tran, Thieu N. Vo, An Nguyen The, Tho Tran Huu, Minh-Khoi Nguyen-Nhat, Thanh Tran, Duy-Tung Pham, Tan Minh Nguyen,
Abstract summary: This paper systematically explores neural functional networks (NFN) for transformer architectures. NFN are specialized neural networks that treat the weights, gradients, or sparsity patterns of a deep neural network (DNN) as input data.
Score: 2.3963215252605172
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper systematically explores neural functional networks (NFN) for transformer architectures. NFN are specialized neural networks that treat the weights, gradients, or sparsity patterns of a deep neural network (DNN) as input data and have proven valuable for tasks such as learnable optimizers, implicit data representations, and weight editing. While NFN have been extensively developed for MLP and CNN, no prior work has addressed their design for transformers, despite the importance of transformers in modern deep learning. This paper aims to address this gap by providing a systematic study of NFN for transformers. We first determine the maximal symmetric group of the weights in a multi-head attention module as well as a necessary and sufficient condition under which two sets of hyperparameters of the multi-head attention module define the same function. We then define the weight space of transformer architectures and its associated group action, which leads to the design principles for NFN in transformers. Based on these, we introduce Transformer-NFN, an NFN that is equivariant under this group action. Additionally, we release a dataset of more than 125,000 Transformers model checkpoints trained on two datasets with two different tasks, providing a benchmark for evaluating Transformer-NFN and encouraging further research on transformer training and performance.

Related papers

Attention Is Not All You Need: The Importance of Feedforward Networks in Transformer Models [0.0]
State-of-the-art models can have over a hundred transformer blocks, containing billions of trainable parameters, and are trained on trillions of tokens of text.<n>We show that models using a transformer block configuration with three-layer FFNs with fewer such blocks outperform the standard two-layer configuration delivering lower training loss with fewer total parameters in less time.
arXiv Detail & Related papers (2025-05-10T12:54:21Z)
Spiking Transformer:Introducing Accurate Addition-Only Spiking Self-Attention for Transformer [15.93436166506258]
Spiking Neural Networks have emerged as a promising energy-efficient alternative to traditional Artificial Neural Networks. This paper introduces Accurate Addition-Only Spiking Self-Attention (A$2$OS$2$A)
arXiv Detail & Related papers (2025-02-28T22:23:29Z)
Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
In neuromorphic computing, spiking neural networks (SNNs) perform inference tasks, offering significant efficiency gains for workloads involving sequential data. Recent advances in hardware and software have demonstrated that embedding a few bits of payload in each spike exchanged between the spiking neurons can further enhance inference accuracy. This paper investigates a wireless neuromorphic split computing architecture employing multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z)
Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot [50.16171384920963]
transformer architecture has prevailed in various deep learning settings. One-layer transformer trained with gradient descent provably learns the sparse token selection task.
arXiv Detail & Related papers (2024-06-11T02:15:53Z)
FBPT: A Fully Binary Point Transformer [12.373066597900127]
This paper presents a novel Fully Binary Point Cloud Transformer (FBPT) model which has the potential to be widely applied and expanded in the fields of robotics and mobile devices. By compressing the weights and activations of a 32-bit full-precision network to 1-bit binary values, the proposed binary point cloud Transformer network significantly reduces the storage footprint and computational resource requirements. The primary focus of this paper is on addressing the performance degradation issue caused by the use of binary point cloud Transformer modules.
arXiv Detail & Related papers (2024-03-15T03:45:10Z)
Transformer Neural Autoregressive Flows [48.68932811531102]
Density estimation can be performed using Normalizing Flows (NFs) We propose a novel solution by exploiting transformers to define a new class of neural flows called Transformer Neural Autoregressive Flows (T-NAFs)
arXiv Detail & Related papers (2024-01-03T17:51:16Z)
Volume-Preserving Transformers for Learning Time Series Data with Structure [0.0]
We develop a transformer-inspired neural network and use it to learn a dynamical system. We change the activation function of the attention layer to imbue the transformer with structure-preserving properties. This is shown to be of great advantage when applying the neural network to learning the trajectory of a rigid body.
arXiv Detail & Related papers (2023-12-18T13:09:55Z)
Neural Functional Transformers [99.98750156515437]
This paper uses the attention mechanism to define a novel set of permutation equivariant weight-space layers called neural functional Transformers (NFTs) NFTs respect weight-space permutation symmetries while incorporating the advantages of attention, which have exhibited remarkable success across multiple domains. We also leverage NFTs to develop Inr2Array, a novel method for computing permutation invariant representations from the weights of implicit neural representations (INRs)
arXiv Detail & Related papers (2023-05-22T23:38:27Z)
FlowTransformer: A Transformer Framework for Flow-based Network Intrusion Detection Systems [0.0]
FlowTransformer is a novel approach for implementing transformer-based NIDSs. It allows the direct substitution of transformer components, including the input encoding, transformer, classification head, and the evaluation of these across any flow-based network dataset.
arXiv Detail & Related papers (2023-04-28T10:40:34Z)
Transformers Solve the Limited Receptive Field for Monocular Depth Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers. This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
Flexible Transmitter Network [84.90891046882213]
Current neural networks are mostly built upon the MP model, which usually formulates the neuron as executing an activation function on the real-valued weighted aggregation of signals received from other neurons. We propose the Flexible Transmitter (FT) model, a novel bio-plausible neuron model with flexible synaptic plasticity. We present the Flexible Transmitter Network (FTNet), which is built on the most common fully-connected feed-forward architecture.
arXiv Detail & Related papers (2020-04-08T06:55:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.