On the Design Space Between Transformers and Recursive Neural Nets
- URL: http://arxiv.org/abs/2409.01531v1
- Date: Tue, 3 Sep 2024 02:03:35 GMT
- Title: On the Design Space Between Transformers and Recursive Neural Nets
- Authors: Jishnu Ray Chowdhury, Cornelia Caragea,
- Abstract summary: Continuous Recursive Neural Networks (CRvNN) and Neural Data Routers (NDR) are studied.
CRvNN pushes the boundaries of traditional RvNN, relaxing its discrete structure-wise composition and ending up with a Transformer-like structure.
NDR constrains the original Transformer to induce better structural inductive bias, ending up with a model that is close to CRvNN.
- Score: 64.862738244735
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study two classes of models, Recursive Neural Networks (RvNNs) and Transformers, and show that a tight connection between them emerges from the recent development of two recent models - Continuous Recursive Neural Networks (CRvNN) and Neural Data Routers (NDR). On one hand, CRvNN pushes the boundaries of traditional RvNN, relaxing its discrete structure-wise composition and ends up with a Transformer-like structure. On the other hand, NDR constrains the original Transformer to induce better structural inductive bias, ending up with a model that is close to CRvNN. Both models, CRvNN and NDR, show strong performance in algorithmic tasks and generalization in which simpler forms of RvNNs and Transformers fail. We explore these "bridge" models in the design space between RvNNs and Transformers, formalize their tight connections, discuss their limitations, and propose ideas for future research.
Related papers
- Recurrent Neural Networks for Still Images [0.0]
We argue that RNNs can effectively handle still images by interpreting the pixels as a sequence.
We introduce a novel RNN design tailored for two-dimensional inputs, such as images, and a custom version of BiDirectional RNN (BiRNN) that is more memory-efficient than traditional implementations.
arXiv Detail & Related papers (2024-09-10T06:07:20Z) - Does Transformer Interpretability Transfer to RNNs? [0.6437284704257459]
Recent advances in recurrent neural network architectures have enabled RNNs to match or exceed the performance of equal-size transformers.
We show that it is possible to improve some of these techniques by taking advantage of RNNs' compressed state.
arXiv Detail & Related papers (2024-04-09T02:59:17Z) - Gated recurrent neural networks discover attention [9.113450161370361]
Recent architectural developments have enabled recurrent neural networks (RNNs) to reach and even surpass the performance of Transformers.
We show how RNNs equipped with linear recurrent layers interconnected by feedforward paths with multiplicative gating can implement self-attention.
Our findings highlight the importance of multiplicative interactions in neural networks and suggest that certain RNNs might be unexpectedly implementing attention under the hood.
arXiv Detail & Related papers (2023-09-04T19:28:54Z) - RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.
We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z) - Spiking Neural Network Decision Feedback Equalization [70.3497683558609]
We propose an SNN-based equalizer with a feedback structure akin to the decision feedback equalizer (DFE)
We show that our approach clearly outperforms conventional linear equalizers for three different exemplary channels.
The proposed SNN with a decision feedback structure enables the path to competitive energy-efficient transceivers.
arXiv Detail & Related papers (2022-11-09T09:19:15Z) - Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on
Riemannian Gradient Descent With Illustrations of Speech Processing [74.31472195046099]
We exploit a low-rank tensor-train deep neural network (TT-DNN) to build an end-to-end deep learning pipeline, namely LR-TT-DNN.
A hybrid model combining LR-TT-DNN with a convolutional neural network (CNN) is set up to boost the performance.
Our empirical evidence demonstrates that the LR-TT-DNN and CNN+(LR-TT-DNN) models with fewer model parameters can outperform the TT-DNN and CNN+(LR-TT-DNN) counterparts.
arXiv Detail & Related papers (2022-03-11T15:55:34Z) - A Battle of Network Structures: An Empirical Study of CNN, Transformer,
and MLP [121.35904748477421]
Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision.
Transformer and multi-layer perceptron (MLP)-based models, such as Vision Transformer and Vision-Mixer, started to lead new trends.
In this paper, we conduct empirical studies on these DNN structures and try to understand their respective pros and cons.
arXiv Detail & Related papers (2021-08-30T06:09:02Z) - Modeling Hierarchical Structures with Continuous Recursive Neural
Networks [33.74585832995141]
Recursive Neural Networks (RvNNs) compose sequences according to their underlying hierarchical syntactic structure.
Traditional RvNNs are incapable of inducing the latent structure in a plain text sequence on their own.
We propose Continuous Recursive Neural Network (CRvNN) as a backpropagation-friendly alternative.
arXiv Detail & Related papers (2021-06-10T20:42:05Z) - Convolutional Neural Networks with Gated Recurrent Connections [25.806036745901114]
recurrent convolution neural network (RCNN) is inspired by abundant recurrent connections in the visual systems of animals.
We propose to modulate the receptive fields (RFs) of neurons by introducing gates to the recurrent connections.
The GRCNN was evaluated on several computer vision tasks including object recognition, scene text recognition and object detection.
arXiv Detail & Related papers (2021-06-05T10:14:59Z) - Binarizing MobileNet via Evolution-based Searching [66.94247681870125]
We propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet.
Inspired by one-shot architecture search frameworks, we manipulate the idea of group convolution to design efficient 1-Bit Convolutional Neural Networks (CNNs)
Our objective is to come up with a tiny yet efficient binary neural architecture by exploring the best candidates of the group convolution.
arXiv Detail & Related papers (2020-05-13T13:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.