Recurrent Attention Networks for Long-text Modeling
- URL: http://arxiv.org/abs/2306.06843v1
- Date: Mon, 12 Jun 2023 03:28:33 GMT
- Title: Recurrent Attention Networks for Long-text Modeling
- Authors: Xianming Li, Zongxi Li, Xiaotian Luo, Haoran Xie, Xing Lee, Yingbin
Zhao, Fu Lee Wang, Qing Li
- Abstract summary: This paper proposes a novel long-document encoding model, Recurrent Attention Network (RAN), to enable the recurrent operation of self-attention.
RAN is capable of extracting global semantics in both token-level and document-level representations, making it inherently compatible with both sequential and classification tasks.
- Score: 14.710722261441822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-attention-based models have achieved remarkable progress in short-text
mining. However, the quadratic computational complexities restrict their
application in long text processing. Prior works have adopted the chunking
strategy to divide long documents into chunks and stack a self-attention
backbone with the recurrent structure to extract semantic representation. Such
an approach disables parallelization of the attention mechanism, significantly
increasing the training cost and raising hardware requirements. Revisiting the
self-attention mechanism and the recurrent structure, this paper proposes a
novel long-document encoding model, Recurrent Attention Network (RAN), to
enable the recurrent operation of self-attention. Combining the advantages from
both sides, the well-designed RAN is capable of extracting global semantics in
both token-level and document-level representations, making it inherently
compatible with both sequential and classification tasks, respectively.
Furthermore, RAN is computationally scalable as it supports parallelization on
long document processing. Extensive experiments demonstrate the long-text
encoding ability of the proposed RAN model on both classification and
sequential tasks, showing its potential for a wide range of applications.
Related papers
- Recycled Attention: Efficient inference for long-context language models [54.00118604124301]
We propose Recycled Attention, an inference-time method which alternates between full context attention and attention over a subset of input tokens.
When performing partial attention, we recycle the attention pattern of a previous token that has performed full attention and attend only to the top K most attended tokens.
Compared to previously proposed inference-time acceleration method which attends only to local context or tokens with high accumulative attention scores, our approach flexibly chooses tokens that are relevant to the current decoding step.
arXiv Detail & Related papers (2024-11-08T18:57:07Z) - KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches [52.02764371205856]
Long context capability is a crucial competency for large language models (LLMs)
This work provides a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks.
arXiv Detail & Related papers (2024-07-01T17:59:47Z) - Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection [28.15184715270483]
Large language models (LLMs) augmented with retrieval exhibit robust performance and extensive versatility.
We propose a novel paradigm named Sparse RAG, which seeks to cut costs through sparsity.
Sparse RAG encodes retrieved documents in parallel, which eliminates latency introduced by long-range attention of retrieved documents.
arXiv Detail & Related papers (2024-05-25T11:10:04Z) - Spatial Semantic Recurrent Mining for Referring Image Segmentation [63.34997546393106]
We propose Stextsuperscript2RM to achieve high-quality cross-modality fusion.
It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing.
Our proposed method performs favorably against other state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-15T00:17:48Z) - LOCOST: State-Space Models for Long Document Abstractive Summarization [76.31514220737272]
We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs.
With a computational complexity of $O(L log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns.
arXiv Detail & Related papers (2024-01-31T15:33:37Z) - Legal-HNet: Mixing Legal Long-Context Tokens with Hartley Transform [0.0]
We introduce a new hybrid Seq2Seq architecture, a no-attention-based encoder connected with an attention-based decoder, which performs quite well on existing summarization tasks.
This not only makes training models from scratch accessible to more people, but also contributes to the reduction of the carbon footprint during training.
arXiv Detail & Related papers (2023-11-09T01:27:54Z) - Attention Where It Matters: Rethinking Visual Document Understanding
with Selective Region Concentration [26.408343160223517]
We propose a novel end-to-end document understanding model called SeRum.
SeRum converts image understanding and recognition tasks into a local decoding process of the visual tokens of interest.
We show that SeRum achieves state-of-the-art performance on document understanding tasks and competitive results on text spotting tasks.
arXiv Detail & Related papers (2023-09-03T10:14:34Z) - Plug-and-Play Regulators for Image-Text Matching [76.28522712930668]
Exploiting fine-grained correspondence and visual-semantic alignments has shown great potential in image-text matching.
We develop two simple but quite effective regulators which efficiently encode the message output to automatically contextualize and aggregate cross-modal representations.
Experiments on MSCOCO and Flickr30K datasets validate that they can bring an impressive and consistent R@1 gain on multiple models.
arXiv Detail & Related papers (2023-03-23T15:42:05Z) - Efficient Long Sequence Encoding via Synchronization [29.075962393432857]
We propose a synchronization mechanism for hierarchical encoding.
Our approach first identifies anchor tokens across segments and groups them by their roles in the original input sequence.
Our approach is able to improve the global information exchange among segments while maintaining efficiency.
arXiv Detail & Related papers (2022-03-15T04:37:02Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.