A Unified Implicit Attention Formulation for Gated-Linear Recurrent Sequence Models
- URL: http://arxiv.org/abs/2405.16504v1
- Date: Sun, 26 May 2024 09:57:45 GMT
- Title: A Unified Implicit Attention Formulation for Gated-Linear Recurrent Sequence Models
- Authors: Itamar Zimerman, Ameen Ali, Lior Wolf,
- Abstract summary: Recent advances in efficient sequence modeling have led to attention-free layers.
We present a unified view of these models, formulating such layers as implicit causal self-attention layers.
- Score: 54.50526986788175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in efficient sequence modeling have led to attention-free layers, such as Mamba, RWKV, and various gated RNNs, all featuring sub-quadratic complexity in sequence length and excellent scaling properties, enabling the construction of a new type of foundation models. In this paper, we present a unified view of these models, formulating such layers as implicit causal self-attention layers. The formulation includes most of their sub-components and is not limited to a specific part of the architecture. The framework compares the underlying mechanisms on similar grounds for different layers and provides a direct means for applying explainability methods. Our experiments show that our attention matrices and attribution method outperform an alternative and a more limited formulation that was recently proposed for Mamba. For the other architectures for which our method is the first to provide such a view, our method is effective and competitive in the relevant metrics compared to the results obtained by state-of-the-art transformer explainability methods. Our code is publicly available.
Related papers
- Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble [11.542472900306745]
Multi-Comprehension (MC) Ensemble is proposed as a strategy to augment the Out-of-Distribution (OOD) feature representation field.
Our experimental results demonstrate the superior performance of the MC Ensemble strategy in OOD detection.
This underscores the effectiveness of our proposed approach in enhancing the model's capability to detect instances outside its training distribution.
arXiv Detail & Related papers (2024-03-24T18:43:04Z) - LaCo: Large Language Model Pruning via Layer Collapse [63.973142426228016]
Large language models (LLMs) based on transformer are witnessing a notable trend of size expansion.
We propose a concise layer-wise pruning method called textitLayer Collapse (LaCo), in which rear model layers collapse into a prior layer.
Experiments show that our method maintains an average task performance of over 80% at pruning ratios of 25-30%.
arXiv Detail & Related papers (2024-02-17T04:16:30Z) - CLIP-QDA: An Explainable Concept Bottleneck Model [3.570403495760109]
We introduce an explainable algorithm designed from a multi-modal foundation model, that performs fast and explainable image classification.
Our explanations compete with existing XAI methods while being faster to compute.
arXiv Detail & Related papers (2023-11-30T18:19:47Z) - Reduce, Reuse, Recycle: Compositional Generation with Energy-Based
Diffusion Models and MCMC [106.06185677214353]
diffusion models have quickly become the prevailing approach to generative modeling in many domains.
We propose an energy-based parameterization of diffusion models which enables the use of new compositional operators.
We find these samplers lead to notable improvements in compositional generation across a wide set of problems.
arXiv Detail & Related papers (2023-02-22T18:48:46Z) - Classification of BCI-EEG based on augmented covariance matrix [0.0]
We propose a new framework based on the augmented covariance extracted from an autoregressive model to improve motor imagery classification.
We will test our approach on several datasets and several subjects using the MOABB framework.
arXiv Detail & Related papers (2023-02-09T09:04:25Z) - Polynomial Networks in Deep Classifiers [55.90321402256631]
We cast the study of deep neural networks under a unifying framework.
Our framework provides insights on the inductive biases of each model.
The efficacy of the proposed models is evaluated on standard image and audio classification benchmarks.
arXiv Detail & Related papers (2021-04-16T06:41:20Z) - Generative Archimedean Copulas [27.705956325584026]
We propose a new generative modeling technique for learning multidimensional cumulative distribution functions (CDFs) in the form of copulas.
We consider certain classes of copulas known as Archimedean and hierarchical Archimedean copulas, popular for their parsimonious representation and ability to model different tail dependencies.
arXiv Detail & Related papers (2021-02-22T20:45:40Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [55.28436972267793]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Evaluating the Disentanglement of Deep Generative Models through
Manifold Topology [66.06153115971732]
We present a method for quantifying disentanglement that only uses the generative model.
We empirically evaluate several state-of-the-art models across multiple datasets.
arXiv Detail & Related papers (2020-06-05T20:54:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.