Learning by Semantic Similarity Makes Abstractive Summarization Better
- URL: http://arxiv.org/abs/2002.07767v2
- Date: Wed, 2 Jun 2021 05:02:43 GMT
- Title: Learning by Semantic Similarity Makes Abstractive Summarization Better
- Authors: Wonjin Yoon, Yoon Sun Yeo, Minbyul Jeong, Bong-Jun Yi, Jaewoo Kang
- Abstract summary: We compare the generated summaries from recent LM, BART, and the reference summaries from a benchmark dataset, CNN/DM.
Interestingly, model-generated summaries receive higher scores relative to reference summaries.
- Score: 13.324006587838522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By harnessing pre-trained language models, summarization models had rapid
progress recently. However, the models are mainly assessed by automatic
evaluation metrics such as ROUGE. Although ROUGE is known for having a positive
correlation with human evaluation scores, it has been criticized for its
vulnerability and the gap between actual qualities. In this paper, we compare
the generated summaries from recent LM, BART, and the reference summaries from
a benchmark dataset, CNN/DM, using a crowd-sourced human evaluation metric.
Interestingly, model-generated summaries receive higher scores relative to
reference summaries. Stemming from our experimental results, we first argue the
intrinsic characteristics of the CNN/DM dataset, the progress of pre-trained
language models, and their ability to generalize on the training data. Finally,
we share our insights into the model-generated summaries and presents our
thought on learning methods for abstractive summarization.
Related papers
- Assessment of Transformer-Based Encoder-Decoder Model for Human-Like Summarization [0.05852077003870416]
This work leverages transformer-based BART model for human-like summarization.
On training and fine-tuning the encoder-decoder model, it is tested with diverse sample articles.
The finetuned model performance is compared with the baseline pretrained model.
Empirical results on BBC News articles highlight that the gold standard summaries written by humans are more factually consistent by 17%.
arXiv Detail & Related papers (2024-10-22T09:25:04Z) - Information-Theoretic Distillation for Reference-less Summarization [67.51150817011617]
We present a novel framework to distill a powerful summarizer based on the information-theoretic objective for summarization.
We start off from Pythia-2.8B as the teacher model, which is not yet capable of summarization.
We arrive at a compact but powerful summarizer with only 568M parameters that performs competitively against ChatGPT.
arXiv Detail & Related papers (2024-03-20T17:42:08Z) - mFACE: Multilingual Summarization with Factual Consistency Evaluation [79.60172087719356]
Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-trained language models and the availability of large-scale datasets.
Despite promising results, current models still suffer from generating factually inconsistent summaries.
We leverage factual consistency evaluation models to improve multilingual summarization.
arXiv Detail & Related papers (2022-12-20T19:52:41Z) - Inverse Reinforcement Learning for Text Summarization [52.765898203824975]
We introduce inverse reinforcement learning (IRL) as an effective paradigm for training abstractive summarization models.
Experimental results across datasets in different domains demonstrate the superiority of our proposed IRL model for summarization over MLE and RL baselines.
arXiv Detail & Related papers (2022-12-19T23:45:05Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in
Abstractive Summarization [6.017006996402699]
We study generating abstractive summaries that are faithful and factually consistent with the given articles.
A novel contrastive learning formulation is presented, which leverages both reference summaries, as positive training data, and automatically generated erroneous summaries, as negative training data, to train summarization systems that are better at distinguishing between them.
arXiv Detail & Related papers (2021-09-19T20:05:21Z) - Unsupervised Reference-Free Summary Quality Evaluation via Contrastive
Learning [66.30909748400023]
We propose to evaluate the summary qualities without reference summaries by unsupervised contrastive learning.
Specifically, we design a new metric which covers both linguistic qualities and semantic informativeness based on BERT.
Experiments on Newsroom and CNN/Daily Mail demonstrate that our new evaluation method outperforms other metrics even without reference summaries.
arXiv Detail & Related papers (2020-10-05T05:04:14Z) - Learning to summarize from human feedback [18.964548137315333]
We show that it is possible to significantly improve summary quality by training a model to optimize for human preferences.
We apply our method to a version of the TL;DR dataset of Reddit posts and find that our models significantly outperform both human reference summaries and much larger models fine-tuned with supervised learning alone.
Our models also transfer to CNN/DM news articles, producing summaries nearly as good as the human reference without any news-specific fine-tuning.
arXiv Detail & Related papers (2020-09-02T19:54:41Z) - SummEval: Re-evaluating Summarization Evaluation [169.622515287256]
We re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion.
We benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics.
We assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset.
arXiv Detail & Related papers (2020-07-24T16:25:19Z) - On Faithfulness and Factuality in Abstractive Summarization [17.261247316769484]
We analyzed limitations of neural text generation models for abstractive document summarization.
We found that these models are highly prone to hallucinate content that is unfaithful to the input document.
We show that textual entailment measures better correlate with faithfulness than standard metrics.
arXiv Detail & Related papers (2020-05-02T00:09:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.