Fovea Transformer: Efficient Long-Context Modeling with Structured
Fine-to-Coarse Attention
- URL: http://arxiv.org/abs/2311.07102v2
- Date: Thu, 11 Jan 2024 14:24:54 GMT
- Title: Fovea Transformer: Efficient Long-Context Modeling with Structured
Fine-to-Coarse Attention
- Authors: Ziwei He, Jian Yuan, Le Zhou, Jingwen Leng, Bo Jiang
- Abstract summary: We introduce Fovea Transformer, a long-context focused transformer.
We use representations of context tokens with a progressively coarser granularity in the tree, as their distance to the query token increases.
We evaluate our model on three long-context summarization tasks.
- Score: 17.48544285026157
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The quadratic complexity of self-attention in Transformers has hindered the
processing of long text. To alleviate this problem, previous works have
proposed to sparsify the attention matrix, taking advantage of the observation
that crucial information about a token can be derived from its neighbors. These
methods typically combine one or another form of local attention and global
attention. Such combinations introduce abrupt changes in contextual granularity
when going from local to global, which may be undesirable. We believe that a
smoother transition could potentially enhance model's ability to capture
long-context dependencies. In this study, we introduce Fovea Transformer, a
long-context focused transformer that addresses the challenges of capturing
global dependencies while maintaining computational efficiency. To achieve
this, we construct a multi-scale tree from the input sequence, and use
representations of context tokens with a progressively coarser granularity in
the tree, as their distance to the query token increases. We evaluate our model
on three long-context summarization tasks\footnote{Our code is publicly
available at: \textit{https://github.com/ZiweiHe/Fovea-Transformer}}. It
achieves state-of-the-art performance on two of them, and competitive results
on the third with mixed improvement and setback of the evaluation metrics.
Related papers
- Efficient Point Transformer with Dynamic Token Aggregating for Point Cloud Processing [19.73918716354272]
We propose an efficient point TransFormer with Dynamic Token Aggregating (DTA-Former) for point cloud representation and processing.
It achieves SOTA performance with up to 30$times$ faster than prior point Transformers on ModelNet40, ShapeNet, and airborne MultiSpectral LiDAR (MS-LiDAR) datasets.
arXiv Detail & Related papers (2024-05-23T20:50:50Z) - iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions.
The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z) - Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for
Long Sequences [16.066338004414092]
textitDiffuser is a new efficient Transformer for sequence-to-sequence modeling.
It incorporates all token interactions within one attention layer while maintaining low computation and memory costs.
We show its ability to approximate full-attention by analyzing the graph expander property from the spectral perspective.
arXiv Detail & Related papers (2022-10-21T08:13:34Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - A Context-Aware Feature Fusion Framework for Punctuation Restoration [28.38472792385083]
We propose a novel Feature Fusion framework based on two-type Attentions (FFA) to alleviate the shortage of attention.
Experiments on the popular benchmark dataset IWSLT demonstrate that our approach is effective.
arXiv Detail & Related papers (2022-03-23T15:29:28Z) - Fastformer: Additive Attention Can Be All You Need [51.79399904527525]
We propose Fastformer, which is an efficient Transformer model based on additive attention.
In Fastformer, instead of modeling the pair-wise interactions between tokens, we first use additive attention mechanism to model global contexts.
In this way, Fastformer can achieve effective context modeling with linear complexity.
arXiv Detail & Related papers (2021-08-20T09:44:44Z) - Combiner: Full Attention Transformer with Sparse Computation Cost [142.10203598824964]
We propose Combiner, which provides full attention capability in each attention head while maintaining low computation complexity.
We show that most sparse attention patterns used in existing sparse transformers are able to inspire the design of such factorization for full attention.
An experimental evaluation on both autoregressive and bidirectional sequence tasks demonstrates the effectiveness of this approach.
arXiv Detail & Related papers (2021-07-12T22:43:11Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - Cluster-Former: Clustering-based Sparse Transformer for Long-Range
Dependency Encoding [90.77031668988661]
Cluster-Former is a novel clustering-based sparse Transformer to perform attention across chunked sequences.
The proposed framework is pivoted on two unique types of Transformer layer: Sliding-Window Layer and Cluster-Former Layer.
Experiments show that Cluster-Former achieves state-of-the-art performance on several major QA benchmarks.
arXiv Detail & Related papers (2020-09-13T22:09:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.