USB: A Unified Summarization Benchmark Across Tasks and Domains
- URL: http://arxiv.org/abs/2305.14296v2
- Date: Mon, 4 Dec 2023 15:53:50 GMT
- Title: USB: A Unified Summarization Benchmark Across Tasks and Domains
- Authors: Kundan Krishna, Prakhar Gupta, Sanjana Ramprasad, Byron C. Wallace,
Jeffrey P. Bigham, Zachary C. Lipton
- Abstract summary: We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks.
We compare various methods on this benchmark and discover that on multiple tasks, moderately-sized fine-tuned models consistently outperform much larger few-shot prompted language models.
- Score: 68.82726887802856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the NLP community has produced numerous summarization benchmarks, none
provide the rich annotations required to simultaneously address many important
problems related to control and reliability. We introduce a Wikipedia-derived
benchmark, complemented by a rich set of crowd-sourced annotations, that
supports $8$ interrelated tasks: (i) extractive summarization; (ii) abstractive
summarization; (iii) topic-based summarization; (iv) compressing selected
sentences into a one-line summary; (v) surfacing evidence for a summary
sentence; (vi) predicting the factual accuracy of a summary sentence; (vii)
identifying unsubstantiated spans in a summary sentence; (viii) correcting
factual errors in summaries. We compare various methods on this benchmark and
discover that on multiple tasks, moderately-sized fine-tuned models
consistently outperform much larger few-shot prompted language models. For
factuality-related tasks, we also evaluate existing heuristics to create
training data and find that training on them results in worse performance than
training on $20\times$ less human-labeled data. Our articles draw from $6$
domains, facilitating cross-domain analysis. On some tasks, the amount of
training data matters more than the domain where it comes from, while for other
tasks training specifically on data from the target domain, even if limited, is
more beneficial.
Related papers
- $\texttt{COSMIC}$: Mutual Information for Task-Agnostic Summarization Evaluation [39.287235598507294]
We propose a novel task-oriented evaluation approach that assesses summarizers based on their capacity to produce summaries that are useful for downstream tasks, while preserving task outcomes.
We introduce $textttCOSMIC$ as a practical implementation of this metric, demonstrating its strong correlation with human judgment-based metrics and its effectiveness in predicting downstream task performance.
arXiv Detail & Related papers (2024-02-29T18:51:23Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - Topic-Guided Sampling For Data-Efficient Multi-Domain Stance Detection [44.06173809190896]
Stance Detection is concerned with identifying the attitudes expressed by an author towards a target of interest.
This task spans a variety of domains ranging from social media opinion identification to detecting the stance for a legal claim.
We present a topic-guided diversity sampling technique and a contrastive objective that is used for fine-tuning a stance.
arXiv Detail & Related papers (2023-06-01T15:00:39Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - UniSumm and SummZoo: Unified Model and Diverse Benchmark for Few-Shot
Summarization [54.59104881168188]
textscUniSumm is a unified few-shot summarization model pre-trained with multiple summarization tasks.
textscSummZoo is a new benchmark to better evaluate few-shot summarizers.
arXiv Detail & Related papers (2022-11-17T18:54:47Z) - Truth Discovery in Sequence Labels from Crowds [12.181422057560201]
Crowdsourcing platforms, such as Amazon Mechanical Turk (AMT), have been deployed to assist in this purpose.
Existing literature in annotation aggregation assumes that annotations are independent and thus faces challenges when handling the sequential label aggregation tasks.
We propose an optimization-based method that infers the ground truth labels using annotations provided by workers for sequential labeling tasks.
arXiv Detail & Related papers (2021-09-09T19:12:13Z) - How to distribute data across tasks for meta-learning? [59.608652082495624]
We show that the optimal number of data points per task depends on the budget, but it converges to a unique constant value for large budgets.
Our results suggest a simple and efficient procedure for data collection.
arXiv Detail & Related papers (2021-03-15T15:38:47Z) - Low Resource Multi-Task Sequence Tagging -- Revisiting Dynamic
Conditional Random Fields [67.51177964010967]
We compare different models for low resource multi-task sequence tagging that leverage dependencies between label sequences for different tasks.
We find that explicit modeling of inter-dependencies between task predictions outperforms single-task as well as standard multi-task models.
arXiv Detail & Related papers (2020-05-01T07:11:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.