Related papers: A Survey of RWKV

A Survey of RWKV

URL: http://arxiv.org/abs/2412.14847v2
Date: Sun, 05 Jan 2025 13:54:06 GMT
Title: A Survey of RWKV
Authors: Zhiyuan Li, Tingyu Xia, Yi Chang, Yuan Wu,
Abstract summary: Receptance Weighted Key Value (RWKV) model offers a novel alternative to the Transformer architecture.<n>Unlike conventional Transformers, which depend heavily on self-attention, RWKV adeptly captures long-range dependencies with minimal computational demands.<n>This paper seeks to fill this gap as the first comprehensive review of the RWKV architecture, its core principles, and its varied applications.
Score: 16.618320854505786
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Receptance Weighted Key Value (RWKV) model offers a novel alternative to the Transformer architecture, merging the benefits of recurrent and attention-based systems. Unlike conventional Transformers, which depend heavily on self-attention, RWKV adeptly captures long-range dependencies with minimal computational demands. By utilizing a recurrent framework, RWKV addresses some computational inefficiencies found in Transformers, particularly in tasks with long sequences. RWKV has recently drawn considerable attention for its robust performance across multiple domains. Despite its growing popularity, no systematic review of the RWKV model exists. This paper seeks to fill this gap as the first comprehensive review of the RWKV architecture, its core principles, and its varied applications, such as natural language generation, natural language understanding, and computer vision. We assess how RWKV compares to traditional Transformer models, highlighting its capability to manage long sequences efficiently and lower computational costs. Furthermore, we explore the challenges RWKV encounters and propose potential directions for future research and advancement. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/RWKV-Survey.

Related papers

U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV [13.528706926224114]
We propose U-RWKV, a novel framework for efficient long-range modeling at O(N) computational cost.<n>The framework introduces two key innovations: the Direction-Adaptive RWKV Module and the Stage-Adaptive Squeeze-and-Excitation Module.<n>Experiments demonstrate that U-RWKV achieves state-of-the-art segmentation performance with high computational efficiency.
arXiv Detail & Related papers (2025-07-15T15:40:17Z)
RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization [10.42496371916904]
RWKV is a modern RNN architecture with comparable performance to Transformer, but still faces challenges when deployed to resource-constrained devices.<n>We propose RWKVQuant, a PTQ framework tailored for RWKV models, consisting of two novel techniques.<n> Experiments show that RWKVQuant can quantize RWKV-6-14B into about 3-bit with less than 1% accuracy loss and 2.14x speed up.
arXiv Detail & Related papers (2025-05-02T08:47:49Z)
RWKV-UNet: Improving UNet with Long-Range Cooperation for Effective Medical Image Segmentation [39.11918061481855]
We propose RWKV-UNet, a novel model that integrates the RWKV structure into the U-Net architecture. This integration enhances the model's ability to capture long-range dependencies and improve contextual understanding. We show that RWKV-UNet achieves state-of-the-art performance on various types of medical image segmentation.
arXiv Detail & Related papers (2025-01-14T22:03:00Z)
Tensor Product Attention Is All You Need [54.40495407154611]
Product Attention (TPA) is a novel attention mechanism that uses tensor decompositions to represent queries, keys, and values compactly. TPA achieves improved model quality alongside memory efficiency. We introduce the ProducT ATTion Transformer (T6), a new model architecture for sequence modeling.
arXiv Detail & Related papers (2025-01-11T03:37:10Z)
Exploring Real&Synthetic Dataset and Linear Attention in Image Restoration [47.26304397935705]
Image restoration aims to recover high-quality images from degraded inputs.<n>Existing methods lack a unified training benchmark for iterations and configurations.<n>We introduce a large-scale IR dataset called ReSyn, which employs a novel image filtering method based on image complexity.
arXiv Detail & Related papers (2024-12-05T02:11:51Z)
The Evolution of RWKV: Advancements in Efficient Language Modeling [0.0]
The paper reviews the development of the Receptance Weighted Key Value architecture, emphasizing its advancements in efficient language modeling. We examine its core innovations, adaptations across various domains, and performance advantages over traditional models. The paper also discusses challenges and future directions for RWKV as a versatile architecture in deep learning.
arXiv Detail & Related papers (2024-11-05T04:10:05Z)
PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences. We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z)
PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning [56.14518823931901]
We present PointRWKV, a model of linear complexity derived from the RWKV model in the NLP field. We first propose to explore the global processing capabilities within PointRWKV blocks using modified multi-headed matrix-valued states. To extract local geometric features simultaneously, we design a parallel branch to encode the point cloud efficiently in a fixed radius near-neighbors graph with a graph stabilizer.
arXiv Detail & Related papers (2024-05-24T05:02:51Z)
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures [99.20299078655376]
This paper introduces Vision-RWKV, a model adapted from the RWKV model used in the NLP field. Our model is designed to efficiently handle sparse inputs and demonstrate robust global processing capabilities. Our evaluations demonstrate that VRWKV surpasses ViT's performance in image classification and has significantly faster speeds and lower memory usage.
arXiv Detail & Related papers (2024-03-04T18:46:20Z)
RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series Tasks [42.27646976600047]
Traditional Recurrent Neural Network (RNN) architectures have historically held prominence in time series tasks. Recent advancements in time series forecasting have seen a shift away from RNNs to tasks such as Transformers, and CNNs. We design an efficient RNN-based model for time series tasks, named RWKV-TS, with three distinctive features.
arXiv Detail & Related papers (2024-01-17T09:56:10Z)
RRWKV: Capturing Long-range Dependencies in RWKV [0.0]
The paper devises the Retrospected Receptance Weighted Key Value architecture via incorporating the retrospecting ability into the RWKV to effectively absorb information. RWKV has exploited a linearly tensor-product attention mechanism and achieved parallelized computations by deploying the time-sequential mode.
arXiv Detail & Related papers (2023-06-08T13:17:06Z)
RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.