Related papers: Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling

Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling

URL: http://arxiv.org/abs/2003.13027v1
Date: Sun, 29 Mar 2020 14:00:17 GMT
Title: Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling
Authors: Dmitrii Aksenov and Juli\'an Moreno-Schneider and Peter Bourgonje and Robert Schwarzenberg and Leonhard Hennig and Georg Rehm
Abstract summary: We train a Transformer-based neural model on the BERT language model. In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size. The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset.
Score: 4.525267347429154
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We explore to what extent knowledge about the pre-trained language model that is used is beneficial for the task of abstractive summarization. To this end, we experiment with conditioning the encoder and decoder of a Transformer-based neural model on the BERT language model. In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size. We also explore how locality modelling, i.e., the explicit restriction of calculations to the local context, can affect the summarization ability of the Transformer. This is done by introducing 2-dimensional convolutional self-attention into the first layers of the encoder. The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset. We additionally train our model on the SwissText dataset to demonstrate usability on German. Both models outperform the baseline in ROUGE scores on two datasets and show its superiority in a manual qualitative analysis.

Related papers

Can bidirectional encoder become the ultimate winner for downstream applications of foundation models? [1.8120356834558644]
Foundational models have the characteristics of pre-training, transfer learning, and self-supervised learning. BERT broke through the limitation of only using one-way methods for language modeling in pre-training by using a masked language model. This article analyzes one-way and bidirectional models based on GPT and BERT and compares their differences based on the purpose of the model.
arXiv Detail & Related papers (2024-11-27T03:31:14Z)
Efficient Machine Translation with a BiLSTM-Attention Approach [0.0]
This paper proposes a novel Seq2Seq model aimed at improving translation quality while reducing the storage space required by the model. The model employs a Bidirectional Long Short-Term Memory network (Bi-LSTM) as the encoder to capture the context information of the input sequence. Compared to the current mainstream Transformer model, our model achieves superior performance on the WMT14 machine translation dataset.
arXiv Detail & Related papers (2024-10-29T01:12:50Z)
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z)
Extensive Evaluation of Transformer-based Architectures for Adverse Drug Events Extraction [6.78974856327994]
Adverse Event (ADE) extraction is one of the core tasks in digital pharmacovigilance. We evaluate 19 Transformer-based models for ADE extraction on informal texts. At the end of our analyses, we identify a list of take-home messages that can be derived from the experimental data.
arXiv Detail & Related papers (2023-06-08T15:25:24Z)
Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models. We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks. OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z)
mFACE: Multilingual Summarization with Factual Consistency Evaluation [79.60172087719356]
Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-trained language models and the availability of large-scale datasets. Despite promising results, current models still suffer from generating factually inconsistent summaries. We leverage factual consistency evaluation models to improve multilingual summarization.
arXiv Detail & Related papers (2022-12-20T19:52:41Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
N-Grammer: Augmenting Transformers with latent n-grams [35.39961549040385]
We propose a simple yet effective modification to the Transformer architecture inspired by the literature in statistical language modeling, by augmenting the model with n-grams that are constructed from a discrete latent representation of the text sequence. We evaluate our model, the N-Grammer on language modeling on the C4 data-set as well as text classification on the SuperGLUE data-set, and find that it outperforms several strong baselines such as the Transformer and the Primer.
arXiv Detail & Related papers (2022-07-13T17:18:02Z)
TunBERT: Pretrained Contextualized Text Representation for Tunisian Dialect [0.0]
We investigate the feasibility of training monolingual Transformer-based language models for under represented languages. We show that the use of noisy web crawled data instead of structured data is more convenient for such non-standardized language. Our best performing TunBERT model reaches or improves the state-of-the-art in all three downstream tasks.
arXiv Detail & Related papers (2021-11-25T15:49:50Z)
CoreLM: Coreference-aware Language Model Fine-Tuning [0.0]
We propose a Fine-Tuning framework, named CoreLM, that extends the architecture of current Pretrained Language Models. We make available information outside the contextual space of the model, which results in a better Language Model for a fraction of the computational cost. Our proposed model achieves a lower Perplexity in GUMBY and LAMBDADA datasets when compared to GPT2 and a fine-tuned version of GPT2 without any changes.
arXiv Detail & Related papers (2021-11-04T08:44:31Z)
Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
Exemplar-Controllable Paraphrasing and Translation using Bitext [57.92051459102902]
We adapt models from prior work to be able to learn solely from bilingual text (bitext) Our single proposed model can perform four tasks: controlled paraphrase generation in both languages and controlled machine translation in both language directions.
arXiv Detail & Related papers (2020-10-12T17:02:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.