PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive
Summarization
- URL: http://arxiv.org/abs/2305.06647v2
- Date: Wed, 28 Feb 2024 09:12:35 GMT
- Title: PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive
Summarization
- Authors: Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, Nan Duan
- Abstract summary: This work proposes PROM, a new PhRase-level cOpying Mechanism that enhances attention on n-grams.
PROM adds an indicator layer to explicitly pick up tokens in n-gram that can be copied from the source, and calculates an auxiliary loss for the copying prediction.
In zero-shot setting, PROM is utilized in the self-supervised pre-training on raw corpora and provides new general baselines on a wide range of summarization datasets.
- Score: 139.242907155883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Based on the remarkable achievements of pre-trained language models in
abstractive summarization, the copying mechanism has proved helpful by
improving the factuality, stability, and overall performance. This work
proposes PROM, a new PhRase-level cOpying Mechanism that enhances attention on
n-grams, which can be applied to zero-shot summarization with pre-training.
PROM adds an indicator layer to explicitly pick up tokens in n-gram that can be
copied from the source, and calculates an auxiliary loss for the copying
prediction. Empirical studies show that PROM makes significant improvements in
fine-tuning on benchmarks. In zero-shot setting, PROM is utilized in the
self-supervised pre-training on raw corpora and provides new general baselines
on a wide range of summarization datasets. Further analysis shows that PROM
performs more reasonable copying and contributes to faithfulness.
Related papers
- Reconsidering Degeneration of Token Embeddings with Definitions for Encoder-based Pre-trained Language Models [20.107727903240065]
We propose DefinitionEMB to re-construct isotropically distributed and semantics-related token embeddings for encoder-based language models.
Our experiments demonstrate the effectiveness of leveraging definitions from Wiktionary to re-construct such embeddings.
arXiv Detail & Related papers (2024-08-02T15:00:05Z) - RDBE: Reasoning Distillation-Based Evaluation Enhances Automatic Essay Scoring [0.0]
Reasoning Distillation-Based Evaluation (RDBE) integrates interpretability to elucidate the rationale behind model scores.
Our experimental results demonstrate the efficacy of RDBE across all scoring rubrics considered in the dataset.
arXiv Detail & Related papers (2024-07-03T05:49:01Z) - A Scalable and Efficient Iterative Method for Copying Machine Learning
Classifiers [0.802904964931021]
This paper introduces a novel sequential approach that significantly reduces the amount of computational resources needed to train or maintain a copy of a machine learning model.
The effectiveness of the sequential approach is demonstrated through experiments with synthetic and real-world datasets, showing significant reductions in time and resources, while maintaining or improving accuracy.
arXiv Detail & Related papers (2023-02-06T10:07:41Z) - Proposal Distribution Calibration for Few-Shot Object Detection [65.19808035019031]
In few-shot object detection (FSOD), the two-step training paradigm is widely adopted to mitigate the severe sample imbalance.
Unfortunately, the extreme data scarcity aggravates the proposal distribution bias, hindering the RoI head from evolving toward novel classes.
We introduce a simple yet effective proposal distribution calibration (PDC) approach to neatly enhance the localization and classification abilities of the RoI head.
arXiv Detail & Related papers (2022-12-15T05:09:11Z) - From Cloze to Comprehension: Retrofitting Pre-trained Masked Language
Model to Pre-trained Machine Reader [130.45769668885487]
Pre-trained Machine Reader (PMR) is a novel method for retrofitting masked language models (MLMs) to pre-trained machine reading comprehension (MRC) models without acquiring labeled data.
To build the proposed PMR, we constructed a large volume of general-purpose and high-quality MRC-style training data.
PMR has the potential to serve as a unified model for tackling various extraction and classification tasks in the MRC formulation.
arXiv Detail & Related papers (2022-12-09T10:21:56Z) - Interpretable Research Replication Prediction via Variational Contextual
Consistency Sentence Masking [14.50690911709558]
Research Replication Prediction (RRP) is the task of predicting whether a published research result can be replicated or not.
In this work, we propose the Variational Contextual Consistency Sentence Masking (VCCSM) method to automatically extract key sentences.
Results of our experiments on RRP along with European Convention of Human Rights (ECHR) datasets demonstrate that VCCSM is able to improve the model interpretability for the long document classification tasks.
arXiv Detail & Related papers (2022-03-28T03:27:13Z) - AttentionHTR: Handwritten Text Recognition Based on Attention
Encoder-Decoder Networks [0.0]
This work proposes an attention-based sequence-to-sequence model for handwritten word recognition.
It exploits models pre-trained on scene text images as a starting point towards tailoring the handwriting recognition models.
The effectiveness of the proposed end-to-end HTR system has been empirically evaluated on a novel multi-writer dataset.
arXiv Detail & Related papers (2022-01-23T22:48:36Z) - On the Copying Behaviors of Pre-Training for Neural Machine Translation [63.914940899327966]
Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance.
In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT.
We propose a simple and effective method named copying penalty to control the copying behaviors in decoding.
arXiv Detail & Related papers (2021-07-17T10:02:30Z) - Neural BRDF Representation and Importance Sampling [79.84316447473873]
We present a compact neural network-based representation of reflectance BRDF data.
We encode BRDFs as lightweight networks, and propose a training scheme with adaptive angular sampling.
We evaluate encoding results on isotropic and anisotropic BRDFs from multiple real-world datasets.
arXiv Detail & Related papers (2021-02-11T12:00:24Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.