New Approaches to Long Document Summarization: Fourier Transform Based
Attention in a Transformer Model
- URL: http://arxiv.org/abs/2111.15473v1
- Date: Thu, 25 Nov 2021 18:03:41 GMT
- Title: New Approaches to Long Document Summarization: Fourier Transform Based
Attention in a Transformer Model
- Authors: Andrew Kiruluta, Andreas Lemos and Eric Lundy
- Abstract summary: We extensively redesign the newly introduced method of token mixing using Fourier Transforms (FNET) to replace the computationally expensive self-attention mechanism.
We also carry out long document summarization using established methods that are capable of processing over 8000 tokens.
All modifications showed better performance on the summarization task than when using the original FNET encoder in a transformer architecture.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this work, we extensively redesign the newly introduced method of token
mixing using Fourier Transforms (FNET) to replace the computationally expensive
self-attention mechanism in a full transformer implementation on a long
document summarization task (> 512 tokens). As a baseline, we also carried out
long document summarization using established methods such as Longformer and
Big Bird transformer models that are capable of processing over 8000 tokens and
are currently the state of the art methods for these type of problems. The
original FNET paper implemented this in an encoder only architecture while
abstractive summarization requires both an encoder and a decoder. Since such a
pretrained transformer model does not currently exist in the public domain, we
decided to implement a full transformer based on this Fourier token mixing
approach in an encoder/decoder architecture which we trained starting with
Glove embeddings for the individual words in the corpus. We investigated a
number of different extensions to the original FNET architecture and evaluated
them on their Rouge F1-score performance on a summarization task. All
modifications showed better performance on the summarization task than when
using the original FNET encoder in a transformer architecture.
Related papers
- iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions.
The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z) - Fourier Transformer: Fast Long Range Modeling by Removing Sequence
Redundancy with FFT Operator [24.690247474891958]
Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large pretrained models.
Our model achieves state-of-the-art performances among all transformer-based models on the long-range modeling benchmark LRA.
For generative seq-to-seq tasks including CNN/DailyMail and ELI5, by inheriting the BART weights our model outperforms the standard BART.
arXiv Detail & Related papers (2023-05-24T12:33:06Z) - Revisiting Transformer-based Models for Long Document Classification [31.60414185940218]
In real-world applications, multi-page multi-paragraph documents are common and cannot be efficiently encoded by vanilla Transformer-based models.
We compare different Transformer-based Long Document Classification (TrLDC) approaches that aim to mitigate the computational overhead of vanilla transformers.
We observe a clear benefit from being able to process longer text, and, based on our results, we derive practical advice of applying Transformer-based models on long document classification tasks.
arXiv Detail & Related papers (2022-04-14T00:44:36Z) - FNetAR: Mixing Tokens with Autoregressive Fourier Transforms [0.0]
We show that FNetAR retains state-of-the-art performance (25.8 ppl) on the task of causal language modeling.
The autoregressive Fourier transform could likely be used for parameter on most Transformer-based time-series prediction models.
arXiv Detail & Related papers (2021-07-22T21:24:02Z) - Stable, Fast and Accurate: Kernelized Attention with Relative Positional
Encoding [63.539333383965726]
We propose a novel way to accelerate attention calculation for Transformers with relative positional encoding (RPE)
Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT)
arXiv Detail & Related papers (2021-06-23T17:51:26Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - FNet: Mixing Tokens with Fourier Transforms [0.578717214982749]
We show that Transformer encoder architectures can be massively sped up with limited accuracy costs.
We replace the self-attention sublayers with simple linear transformations that "mix" input tokens.
The resulting model, which we name FNet, scales very efficiently to long inputs.
arXiv Detail & Related papers (2021-05-09T03:32:48Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR)
SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.