Related papers: A Survey of Retentive Network

A Survey of Retentive Network

URL: http://arxiv.org/abs/2506.06708v1
Date: Sat, 07 Jun 2025 08:09:26 GMT
Title: A Survey of Retentive Network
Authors: Haiqi Yang, Zhiyuan Li, Yi Chang, Yuan Wu,
Abstract summary: Retentive Network (RetNet) represents a significant advancement in neural network architecture, offering an efficient alternative to the Transformer.<n>RetNet introduces a retention mechanism that unifies the inductive bias of recurrence with the global dependency modeling of attention.
Score: 16.11958932344012
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retentive Network (RetNet) represents a significant advancement in neural network architecture, offering an efficient alternative to the Transformer. While Transformers rely on self-attention to model dependencies, they suffer from high memory costs and limited scalability when handling long sequences due to their quadratic complexity. To mitigate these limitations, RetNet introduces a retention mechanism that unifies the inductive bias of recurrence with the global dependency modeling of attention. This mechanism enables linear-time inference, facilitates efficient modeling of extended contexts, and remains compatible with fully parallelizable training pipelines. RetNet has garnered significant research interest due to its consistently demonstrated cross-domain effectiveness, achieving robust performance across machine learning paradigms including natural language processing, speech recognition, and time-series analysis. However, a comprehensive review of RetNet is still missing from the current literature. This paper aims to fill that gap by offering the first detailed survey of the RetNet architecture, its key innovations, and its diverse applications. We also explore the main challenges associated with RetNet and propose future research directions to support its continued advancement in both academic research and practical deployment.

Related papers

Optimal Depth of Neural Networks [2.1756081703276]
This paper introduces a formal theoretical framework to address Determining the optimal depth of a neural network.<n>We model the layer-by-layer evolution of hidden representations as a sequential decision process.<n>We propose a novel and practical regularization term, $mathcalL_rm depth$, that encourages the network to learn representations amenable to efficient, early exiting.
arXiv Detail & Related papers (2025-06-20T09:26:01Z)
Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z)
TDNetGen: Empowering Complex Network Resilience Prediction with Generative Augmentation of Topology and Dynamics [14.25304439234864]
We introduce a novel resilience prediction framework for complex networks, designed to tackle this issue through generative data augmentation of network topology and dynamics. Experiment results on three network datasets demonstrate that our proposed framework TDNetGen can achieve high prediction accuracy up to 85%-95%.
arXiv Detail & Related papers (2024-08-19T09:20:31Z)
State-Space Modeling in Long Sequence Processing: A Survey on Recurrence in the Transformer Era [59.279784235147254]
This survey provides an in-depth summary of the latest approaches that are based on recurrent models for sequential data processing. The emerging picture suggests that there is room for thinking of novel routes, constituted by learning algorithms which depart from the standard Backpropagation Through Time.
arXiv Detail & Related papers (2024-06-13T12:51:22Z)
On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era [59.279784235147254]
This survey is aimed at providing an overview of these trends framed under the unifying umbrella of Recurrence. It emphasizes novel research opportunities that become prominent when abandoning the idea of processing long sequences.
arXiv Detail & Related papers (2024-02-12T23:55:55Z)
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity. Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z)
Learning Fast and Slow for Online Time Series Forecasting [76.50127663309604]
Fast and Slow learning Networks (FSNet) is a holistic framework for online time-series forecasting. FSNet balances fast adaptation to recent changes and retrieving similar old knowledge. Our code will be made publicly available.
arXiv Detail & Related papers (2022-02-23T18:23:07Z)
Online Estimation and Community Detection of Network Point Processes for Event Streams [12.211623200731788]
A common goal in network modeling is to uncover the latent community structure present among nodes. We propose a fast online variational inference algorithm for estimating the latent structure underlying dynamic event arrivals on a network. We demonstrate that online inference can obtain comparable performance, in terms of community recovery, to non-online variants.
arXiv Detail & Related papers (2020-09-03T15:39:55Z)
Lipschitz Recurrent Neural Networks [100.72827570987992]
We show that our Lipschitz recurrent unit is more robust with respect to input and parameter perturbations as compared to other continuous-time RNNs. Our experiments demonstrate that the Lipschitz RNN can outperform existing recurrent units on a range of benchmark tasks.
arXiv Detail & Related papers (2020-06-22T08:44:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.