Curriculum-Guided Abstractive Summarization
- URL: http://arxiv.org/abs/2302.01342v1
- Date: Thu, 2 Feb 2023 11:09:37 GMT
- Title: Curriculum-Guided Abstractive Summarization
- Authors: Sajad Sotudeh, Hanieh Deilamsalehy, Franck Dernoncourt, Nazli Goharian
- Abstract summary: Recent Transformer-based summarization models have provided a promising approach to abstractive summarization.
These models have two shortcomings: (1) they often perform poorly in content selection, and (2) their training strategy is not quite efficient, which restricts model performance.
In this paper, we explore two ways to compensate for these pitfalls. First, we augment the Transformer network with a sentence cross-attention module in the decoder, encouraging more abstraction of salient content.
- Score: 45.57561926145256
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent Transformer-based summarization models have provided a promising
approach to abstractive summarization. They go beyond sentence selection and
extractive strategies to deal with more complicated tasks such as novel word
generation and sentence paraphrasing. Nonetheless, these models have two
shortcomings: (1) they often perform poorly in content selection, and (2) their
training strategy is not quite efficient, which restricts model performance. In
this paper, we explore two orthogonal ways to compensate for these pitfalls.
First, we augment the Transformer network with a sentence cross-attention
module in the decoder, encouraging more abstraction of salient content. Second,
we include a curriculum learning approach to reweight the training samples,
bringing about an efficient learning procedure. Our second approach to enhance
the training strategy of Transformers networks makes stronger gains as compared
to the first approach. We apply our model on extreme summarization dataset of
Reddit TIFU posts. We further look into three cross-domain summarization
datasets (Webis-TLDR-17, CNN/DM, and XSum), measuring the efficacy of
curriculum learning when applied in summarization. Moreover, a human evaluation
is conducted to show the efficacy of the proposed method in terms of
qualitative criteria, namely, fluency, informativeness, and overall quality.
Related papers
- An Active Learning Framework for Inclusive Generation by Large Language Models [32.16984263644299]
Large Language Models (LLMs) generate text representative of diverse sub-populations.
We propose a novel clustering-based active learning framework, enhanced with knowledge distillation.
We construct two new datasets in tandem with model training, showing a performance improvement of 2%-10% over baseline models.
arXiv Detail & Related papers (2024-10-17T15:09:35Z) - CLearViD: Curriculum Learning for Video Description [3.5293199207536627]
Video description entails automatically generating coherent natural language sentences that narrate the content of a given video.
We introduce CLearViD, a transformer-based model for video description generation that leverages curriculum learning to accomplish this task.
The results on two datasets, namely ActivityNet Captions and YouCook2, show that CLearViD significantly outperforms existing state-of-the-art models in terms of both accuracy and diversity metrics.
arXiv Detail & Related papers (2023-11-08T06:20:32Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Learning with Rejection for Abstractive Text Summarization [42.15551472507393]
We propose a training objective for abstractive summarization based on rejection learning.
We show that our method considerably improves the factuality of generated summaries in automatic and human evaluations.
arXiv Detail & Related papers (2023-02-16T19:07:08Z) - An Imitation Learning Curriculum for Text Editing with
Non-Autoregressive Models [22.996178360362734]
We show that imitation learning algorithms for machine translation introduce mismatches between training and inference that lead to undertraining and poor generalization in editing scenarios.
We show the efficacy of these strategies on two challenging English editing tasks: controllable text simplification and abstractive summarization.
arXiv Detail & Related papers (2022-03-17T17:36:23Z) - Improving Zero and Few-Shot Abstractive Summarization with Intermediate
Fine-tuning and Data Augmentation [101.26235068460551]
Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on English text summarization tasks.
Models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains.
We introduce a novel and generalizable method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner.
arXiv Detail & Related papers (2020-10-24T08:36:49Z) - On Learning Text Style Transfer with Direct Rewards [101.97136885111037]
Lack of parallel corpora makes it impossible to directly train supervised models for the text style transfer task.
We leverage semantic similarity metrics originally used for fine-tuning neural machine translation models.
Our model provides significant gains in both automatic and human evaluation over strong baselines.
arXiv Detail & Related papers (2020-10-24T04:30:02Z) - Multi-Fact Correction in Abstractive Text Summarization [98.27031108197944]
Span-Fact is a suite of two factual correction models that leverages knowledge learned from question answering models to make corrections in system-generated summaries via span selection.
Our models employ single or multi-masking strategies to either iteratively or auto-regressively replace entities in order to ensure semantic consistency w.r.t. the source text.
Experiments show that our models significantly boost the factual consistency of system-generated summaries without sacrificing summary quality in terms of both automatic metrics and human evaluation.
arXiv Detail & Related papers (2020-10-06T02:51:02Z) - Dynamic Data Selection and Weighting for Iterative Back-Translation [116.14378571769045]
We propose a curriculum learning strategy for iterative back-translation models.
We evaluate our models on domain adaptation, low-resource, and high-resource MT settings.
Experimental results demonstrate that our methods achieve improvements of up to 1.8 BLEU points over competitive baselines.
arXiv Detail & Related papers (2020-04-07T19:49:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.