BRIO: Bringing Order to Abstractive Summarization
- URL: http://arxiv.org/abs/2203.16804v1
- Date: Thu, 31 Mar 2022 05:19:38 GMT
- Title: BRIO: Bringing Order to Abstractive Summarization
- Authors: Yixin Liu, Pengfei Liu, Dragomir Radev, Graham Neubig
- Abstract summary: We propose a novel training paradigm which assumes a non-deterministic distribution.
Our method achieves a new state-of-the-art result on the CNN/DailyMail (47.78 ROUGE-1) and XSum (49.07 ROUGE-1) datasets.
- Score: 107.97378285293507
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Abstractive summarization models are commonly trained using maximum
likelihood estimation, which assumes a deterministic (one-point) target
distribution in which an ideal model will assign all the probability mass to
the reference summary. This assumption may lead to performance degradation
during inference, where the model needs to compare several system-generated
(candidate) summaries that have deviated from the reference summary. To address
this problem, we propose a novel training paradigm which assumes a
non-deterministic distribution so that different candidate summaries are
assigned probability mass according to their quality. Our method achieves a new
state-of-the-art result on the CNN/DailyMail (47.78 ROUGE-1) and XSum (49.07
ROUGE-1) datasets. Further analysis also shows that our model can estimate
probabilities of candidate summaries that are more correlated with their level
of quality.
Related papers
- A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework in Large Language Models (LLMs)
We derive novel metrics with high-probability guarantees concerning the output distribution of a model.
Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - PQMass: Probabilistic Assessment of the Quality of Generative Models
using Probability Mass Estimation [8.527898482146103]
We propose a comprehensive sample-based method for assessing the quality of generative models.
The proposed approach enables the estimation of the probability that two sets of samples are drawn from the same distribution.
arXiv Detail & Related papers (2024-02-06T19:39:26Z) - MQAG: Multiple-choice Question Answering and Generation for Assessing
Information Consistency in Summarization [55.60306377044225]
State-of-the-art summarization systems can generate highly fluent summaries.
These summaries, however, may contain factual inconsistencies and/or information not present in the source.
We introduce an alternative scheme based on standard information-theoretic measures in which the information present in the source and summary is directly compared.
arXiv Detail & Related papers (2023-01-28T23:08:25Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Optimizing model-agnostic Random Subspace ensembles [5.680512932725364]
We present a model-agnostic ensemble approach for supervised learning.
The proposed approach alternates between learning an ensemble of models using a parametric version of the Random Subspace approach.
We show the good performance of the proposed approach, both in terms of prediction and feature ranking, on simulated and real-world datasets.
arXiv Detail & Related papers (2021-09-07T13:58:23Z) - Uncertainty-Aware Abstractive Summarization [3.1423034006764965]
We propose a novel approach to summarization based on Bayesian deep learning.
We show that our variational equivalents of BART and PEG can outperform their deterministic counterparts on multiple benchmark datasets.
Having a reliable uncertainty measure, we can improve the experience of the end user by filtering generated summaries of high uncertainty.
arXiv Detail & Related papers (2021-05-21T06:36:40Z) - One for More: Selecting Generalizable Samples for Generalizable ReID
Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function.
Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.