SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for
Abstractive Summarization
- URL: http://arxiv.org/abs/2203.06569v2
- Date: Fri, 26 May 2023 05:48:29 GMT
- Title: SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for
Abstractive Summarization
- Authors: Mathieu Ravaut, Shafiq Joty, Nancy F. Chen
- Abstract summary: We show that it is possible to directly train a second-stage model performing re-ranking on a set of summary candidates.
Our mixture-of-experts SummaReranker learns to select a better candidate and consistently improves the performance of the base model.
- Score: 26.114829566197976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence-to-sequence neural networks have recently achieved great success in
abstractive summarization, especially through fine-tuning large pre-trained
language models on the downstream dataset. These models are typically decoded
with beam search to generate a unique summary. However, the search space is
very large, and with the exposure bias, such decoding is not optimal. In this
paper, we show that it is possible to directly train a second-stage model
performing re-ranking on a set of summary candidates. Our mixture-of-experts
SummaReranker learns to select a better candidate and consistently improves the
performance of the base model. With a base PEGASUS, we push ROUGE scores by
5.44% on CNN-DailyMail (47.16 ROUGE-1), 1.31% on XSum (48.12 ROUGE-1) and 9.34%
on Reddit TIFU (29.83 ROUGE-1), reaching a new state-of-the-art. Our code and
checkpoints will be available at https://github.com/ntunlp/SummaReranker.
Related papers
- Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control [66.78146440275093]
Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors.
We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval.
Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets.
Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
arXiv Detail & Related papers (2024-02-27T14:21:56Z) - Score Mismatching for Generative Modeling [4.413162309652114]
We propose a new score-based model with one-step sampling.
We train a standalone generator to compress all the time steps with the gradient backpropagated from the score network.
In order to produce meaningful gradients for the generator, the score network is trained to simultaneously match the real data distribution and mismatch the fake data distribution.
arXiv Detail & Related papers (2023-09-20T03:47:12Z) - SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot
Neural Sparse Retrieval [92.27387459751309]
We provide SPRINT, a unified Python toolkit for evaluating neural sparse retrieval.
We establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR.
We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document.
arXiv Detail & Related papers (2023-07-19T22:48:02Z) - Generating EDU Extracts for Plan-Guided Summary Re-Ranking [77.7752504102925]
Two-step approaches, in which summary candidates are generated-then-reranked to return a single summary, can improve ROUGE scores over the standard single-step approach.
We design a novel method to generate candidates for re-ranking that addresses these issues.
We show large relevance improvements over previously published methods on widely used single document news article corpora.
arXiv Detail & Related papers (2023-05-28T17:22:04Z) - ReGen: Zero-Shot Text Classification via Training Data Generation with
Progressive Dense Retrieval [22.882301169283323]
We propose a retrieval-enhanced framework to create training data from a general-domain unlabeled corpus.
Experiments on nine datasets demonstrate that REGEN achieves 4.3% gain over the strongest baselines and saves around 70% of the time compared to baselines using large NLG models.
arXiv Detail & Related papers (2023-05-18T04:30:09Z) - Towards Summary Candidates Fusion [26.114829566197976]
We propose a new paradigm in second-stage abstractive summarization called SummaFusion.
It fuses several summary candidates to produce a novel abstractive second-stage summary.
Our method works well on several summarization datasets, improving both the ROUGE scores and qualitative properties of fused summaries.
arXiv Detail & Related papers (2022-10-17T06:48:05Z) - Z-Code++: A Pre-trained Language Model Optimized for Abstractive
Summarization [108.09419317477986]
Z-Code++ is a new pre-trained language model optimized for abstractive text summarization.
The model is first pre-trained using text corpora for language understanding, and then is continually pre-trained on summarization corpora for grounded text generation.
Our model is parameter-efficient in that it outperforms the 600x larger PaLM-540B on XSum, and the finetuned 200x larger GPT3-175B on SAMSum.
arXiv Detail & Related papers (2022-08-21T01:00:54Z) - BRIO: Bringing Order to Abstractive Summarization [107.97378285293507]
We propose a novel training paradigm which assumes a non-deterministic distribution.
Our method achieves a new state-of-the-art result on the CNN/DailyMail (47.78 ROUGE-1) and XSum (49.07 ROUGE-1) datasets.
arXiv Detail & Related papers (2022-03-31T05:19:38Z) - Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven
Cloze Reward [42.925345819778656]
We present ASGARD, a novel framework for Abstractive Summarization with Graph-Augmentation and semantic-driven RewarD.
We propose the use of dual encoders---a sequential document encoder and a graph-structured encoder---to maintain the global context and local characteristics of entities.
Results show that our models produce significantly higher ROUGE scores than a variant without knowledge graph as input on both New York Times and CNN/Daily Mail datasets.
arXiv Detail & Related papers (2020-05-03T18:23:06Z) - Pre-training for Abstractive Document Summarization by Reinstating
Source Text [105.77348528847337]
This paper presents three pre-training objectives which allow us to pre-train a Seq2Seq based abstractive summarization model on unlabeled text.
Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines.
arXiv Detail & Related papers (2020-04-04T05:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.