Equivariant Neural Functional Networks for Transformers
- URL: http://arxiv.org/abs/2410.04209v1
- Date: Sat, 5 Oct 2024 15:56:57 GMT
- Title: Equivariant Neural Functional Networks for Transformers
- Authors: Viet-Hoang Tran, Thieu N. Vo, An Nguyen The, Tho Tran Huu, Minh-Khoi Nguyen-Nhat, Thanh Tran, Duy-Tung Pham, Tan Minh Nguyen,
- Abstract summary: This paper systematically explores neural functional networks (NFN) for transformer architectures.
NFN are specialized neural networks that treat the weights, gradients, or sparsity patterns of a deep neural network (DNN) as input data.
- Score: 2.3963215252605172
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper systematically explores neural functional networks (NFN) for transformer architectures. NFN are specialized neural networks that treat the weights, gradients, or sparsity patterns of a deep neural network (DNN) as input data and have proven valuable for tasks such as learnable optimizers, implicit data representations, and weight editing. While NFN have been extensively developed for MLP and CNN, no prior work has addressed their design for transformers, despite the importance of transformers in modern deep learning. This paper aims to address this gap by providing a systematic study of NFN for transformers. We first determine the maximal symmetric group of the weights in a multi-head attention module as well as a necessary and sufficient condition under which two sets of hyperparameters of the multi-head attention module define the same function. We then define the weight space of transformer architectures and its associated group action, which leads to the design principles for NFN in transformers. Based on these, we introduce Transformer-NFN, an NFN that is equivariant under this group action. Additionally, we release a dataset of more than 125,000 Transformers model checkpoints trained on two datasets with two different tasks, providing a benchmark for evaluating Transformer-NFN and encouraging further research on transformer training and performance.
Related papers
- Binary Event-Driven Spiking Transformer [36.815359983551986]
Transformer-based Spiking Neural Networks (SNNs) introduce a novel event-driven self-attention paradigm.
We propose the Binary Event-Driven Spiking Transformer, i.e. BESTformer.
BESTformer suffers from a severe performance drop from its full-precision counterpart due to the limited representation capability of binarization.
arXiv Detail & Related papers (2025-01-10T12:00:11Z) - Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
Neuromorphic computing uses spiking neural networks (SNNs) to perform inference tasks.
embedding a small payload within each spike exchanged between spiking neurons can enhance inference accuracy without increasing energy consumption.
split computing - where an SNN is partitioned across two devices - is a promising solution.
This paper presents the first comprehensive study of a neuromorphic wireless split computing architecture that employs multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z) - Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot [50.16171384920963]
transformer architecture has prevailed in various deep learning settings.
One-layer transformer trained with gradient descent provably learns the sparse token selection task.
arXiv Detail & Related papers (2024-06-11T02:15:53Z) - FBPT: A Fully Binary Point Transformer [12.373066597900127]
This paper presents a novel Fully Binary Point Cloud Transformer (FBPT) model which has the potential to be widely applied and expanded in the fields of robotics and mobile devices.
By compressing the weights and activations of a 32-bit full-precision network to 1-bit binary values, the proposed binary point cloud Transformer network significantly reduces the storage footprint and computational resource requirements.
The primary focus of this paper is on addressing the performance degradation issue caused by the use of binary point cloud Transformer modules.
arXiv Detail & Related papers (2024-03-15T03:45:10Z) - Transformer Neural Autoregressive Flows [48.68932811531102]
Density estimation can be performed using Normalizing Flows (NFs)
We propose a novel solution by exploiting transformers to define a new class of neural flows called Transformer Neural Autoregressive Flows (T-NAFs)
arXiv Detail & Related papers (2024-01-03T17:51:16Z) - Neural Functional Transformers [99.98750156515437]
This paper uses the attention mechanism to define a novel set of permutation equivariant weight-space layers called neural functional Transformers (NFTs)
NFTs respect weight-space permutation symmetries while incorporating the advantages of attention, which have exhibited remarkable success across multiple domains.
We also leverage NFTs to develop Inr2Array, a novel method for computing permutation invariant representations from the weights of implicit neural representations (INRs)
arXiv Detail & Related papers (2023-05-22T23:38:27Z) - FlowTransformer: A Transformer Framework for Flow-based Network
Intrusion Detection Systems [0.0]
FlowTransformer is a novel approach for implementing transformer-based NIDSs.
It allows the direct substitution of transformer components, including the input encoding, transformer, classification head, and the evaluation of these across any flow-based network dataset.
arXiv Detail & Related papers (2023-04-28T10:40:34Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - Flexible Transmitter Network [84.90891046882213]
Current neural networks are mostly built upon the MP model, which usually formulates the neuron as executing an activation function on the real-valued weighted aggregation of signals received from other neurons.
We propose the Flexible Transmitter (FT) model, a novel bio-plausible neuron model with flexible synaptic plasticity.
We present the Flexible Transmitter Network (FTNet), which is built on the most common fully-connected feed-forward architecture.
arXiv Detail & Related papers (2020-04-08T06:55:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.