Long-Span Dependencies in Transformer-based Summarization Systems
- URL: http://arxiv.org/abs/2105.03801v1
- Date: Sat, 8 May 2021 23:53:03 GMT
- Title: Long-Span Dependencies in Transformer-based Summarization Systems
- Authors: Potsawee Manakul and Mark J. F. Gales
- Abstract summary: Transformer-based models have achieved state-of-the-art results in a wide range of natural language processing (NLP) tasks including document summarization.
One issue with these transformer-based models is that they do not scale well in terms of memory and compute requirements as the input length grows.
In this work, we exploit large pre-trained transformer-based models and address long-span dependencies in abstractive summarization.
- Score: 38.672160430296536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based models have achieved state-of-the-art results in a wide
range of natural language processing (NLP) tasks including document
summarization. Typically these systems are trained by fine-tuning a large
pre-trained model to the target task. One issue with these transformer-based
models is that they do not scale well in terms of memory and compute
requirements as the input length grows. Thus, for long document summarization,
it can be challenging to train or fine-tune these models. In this work, we
exploit large pre-trained transformer-based models and address long-span
dependencies in abstractive summarization using two methods: local
self-attention; and explicit content selection. These approaches are compared
on a range of network configurations. Experiments are carried out on standard
long-span summarization tasks, including Spotify Podcast, arXiv, and PubMed
datasets. We demonstrate that by combining these methods, we can achieve
state-of-the-art results on all three tasks in the ROUGE scores. Moreover,
without a large-scale GPU card, our approach can achieve comparable or better
results than existing approaches.
Related papers
- Jaeger: A Concatenation-Based Multi-Transformer VQA Model [0.13654846342364307]
Document-based Visual Question Answering poses a challenging task between linguistic sense disambiguation and fine-grained multimodal retrieval.
We propose Jaegar, a concatenation-based multi-transformer VQA model.
Our approach has the potential to amplify the performance of these models through concatenation.
arXiv Detail & Related papers (2023-10-11T00:14:40Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Long-Range Transformer Architectures for Document Understanding [1.9331361036118608]
Document Understanding (DU) was not left behind with first Transformer based models for DU dating from late 2019.
We introduce 2 new multi-modal (text + layout) long-range models for DU based on efficient implementations of Transformers for long sequences.
Relative 2D attention revealed to be effective on dense text for both normal and long-range models.
arXiv Detail & Related papers (2023-09-11T14:45:24Z) - A Unified View of Long-Sequence Models towards Modeling Million-Scale
Dependencies [0.0]
We compare existing solutions to long-sequence modeling in terms of their pure mathematical formulation.
We then demonstrate that long context length does yield better performance, albeit application-dependent.
Inspired by emerging sparse models of huge capacity, we propose a machine learning system for handling million-scale dependencies.
arXiv Detail & Related papers (2023-02-13T09:47:31Z) - Transformer-based Models for Long-Form Document Matching: Challenges and
Empirical Analysis [12.269318291685753]
We show that simple neural models outperform the more complex BERT-based models.
Simple models are also more robust to variations in document length and text perturbations.
arXiv Detail & Related papers (2023-02-07T21:51:05Z) - DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models.
We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn.
Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z) - Retrieve-and-Fill for Scenario-based Task-Oriented Semantic Parsing [110.4684789199555]
We introduce scenario-based semantic parsing: a variant of the original task which first requires disambiguating an utterance's "scenario"
This formulation enables us to isolate coarse-grained and fine-grained aspects of the task, each of which we solve with off-the-shelf neural modules.
Our model is modular, differentiable, interpretable, and allows us to garner extra supervision from scenarios.
arXiv Detail & Related papers (2022-02-02T08:00:21Z) - HyperTransformer: Model Generation for Supervised and Semi-Supervised
Few-Shot Learning [14.412066456583917]
We propose a transformer-based model for few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples.
Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal.
We extend our approach to a semi-supervised regime utilizing unlabeled samples in the support set and further improving few-shot performance.
arXiv Detail & Related papers (2022-01-11T20:15:35Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.