Abstractive Summarization with Combination of Pre-trained
Sequence-to-Sequence and Saliency Models
- URL: http://arxiv.org/abs/2003.13028v1
- Date: Sun, 29 Mar 2020 14:00:25 GMT
- Title: Abstractive Summarization with Combination of Pre-trained
Sequence-to-Sequence and Saliency Models
- Authors: Itsumi Saito, Kyosuke Nishida, Kosuke Nishida, Junji Tomita
- Abstract summary: We investigate the effectiveness of combining saliency models that identify the important parts of the source text with pre-trained seq-to-seq models.
Most of the combination models outperformed a simple fine-tuned seq-to-seq model on both the CNN/DM and XSum datasets.
- Score: 11.420640383826656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained sequence-to-sequence (seq-to-seq) models have significantly
improved the accuracy of several language generation tasks, including
abstractive summarization. Although the fluency of abstractive summarization
has been greatly improved by fine-tuning these models, it is not clear whether
they can also identify the important parts of the source text to be included in
the summary. In this study, we investigated the effectiveness of combining
saliency models that identify the important parts of the source text with the
pre-trained seq-to-seq models through extensive experiments. We also proposed a
new combination model consisting of a saliency model that extracts a token
sequence from a source text and a seq-to-seq model that takes the sequence as
an additional input text. Experimental results showed that most of the
combination models outperformed a simple fine-tuned seq-to-seq model on both
the CNN/DM and XSum datasets even if the seq-to-seq model is pre-trained on
large-scale corpora. Moreover, for the CNN/DM dataset, the proposed combination
model exceeded the previous best-performed model by 1.33 points on ROUGE-L.
Related papers
- Seq2seq is All You Need for Coreference Resolution [26.551602768015986]
We finetune a pretrained seq2seq transformer to map an input document to a tagged sequence encoding the coreference annotation.
Our model outperforms or closely matches the best coreference systems in the literature on an array of datasets.
arXiv Detail & Related papers (2023-10-20T19:17:22Z) - Calibrating Likelihoods towards Consistency in Summarization Models [22.023863165579602]
We argue that the main reason for such behavior is that the summarization models trained with maximum likelihood objective assign high probability to plausible sequences given the context.
In this work, we solve this problem by calibrating the likelihood of model generated sequences to better align with a consistency metric measured by natural language inference (NLI) models.
arXiv Detail & Related papers (2023-10-12T23:17:56Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Model ensemble instead of prompt fusion: a sample-specific knowledge
transfer method for few-shot prompt tuning [85.55727213502402]
We focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks.
We propose Sample-specific Ensemble of Source Models (SESoM)
SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs.
arXiv Detail & Related papers (2022-10-23T01:33:16Z) - DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models [15.913828295673705]
DiffuSeq is a diffusion model designed for sequence-to-sequence (Seq2Seq) text generation tasks.
We show that DiffuSeq achieves comparable or even better performance than six established baselines.
A theoretical analysis reveals the connection between DiffuSeq and autoregressive/non-autoregressive models.
arXiv Detail & Related papers (2022-10-17T10:49:08Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Unlocking Compositional Generalization in Pre-trained Models Using
Intermediate Representations [27.244943870086175]
Sequence-to-sequence (seq2seq) models have been found to struggle at out-of-distribution compositional generalization.
We study the impact of intermediate representations on compositional generalization in pre-trained seq2seq models.
arXiv Detail & Related papers (2021-04-15T14:15:14Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Multi-Fact Correction in Abstractive Text Summarization [98.27031108197944]
Span-Fact is a suite of two factual correction models that leverages knowledge learned from question answering models to make corrections in system-generated summaries via span selection.
Our models employ single or multi-masking strategies to either iteratively or auto-regressively replace entities in order to ensure semantic consistency w.r.t. the source text.
Experiments show that our models significantly boost the factual consistency of system-generated summaries without sacrificing summary quality in terms of both automatic metrics and human evaluation.
arXiv Detail & Related papers (2020-10-06T02:51:02Z) - Pre-training for Abstractive Document Summarization by Reinstating
Source Text [105.77348528847337]
This paper presents three pre-training objectives which allow us to pre-train a Seq2Seq based abstractive summarization model on unlabeled text.
Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines.
arXiv Detail & Related papers (2020-04-04T05:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.