Revisiting Sentence Union Generation as a Testbed for Text Consolidation
- URL: http://arxiv.org/abs/2305.15605v1
- Date: Wed, 24 May 2023 22:34:01 GMT
- Title: Revisiting Sentence Union Generation as a Testbed for Text Consolidation
- Authors: Eran Hirsch, Valentina Pyatkin, Ruben Wolhandler, Avi Caciularu, Asi
Shefer, Ido Dagan
- Abstract summary: We propose revisiting the sentence union generation task as an effective well-defined testbed for assessing text consolidation capabilities.
We present refined annotation methodology and tools for crowdsourcing sentence union, create the largest union dataset to date.
We then propose a comprehensive evaluation protocol for union generation, including both human and automatic evaluation.
- Score: 17.594941316215838
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tasks involving text generation based on multiple input texts, such as
multi-document summarization, long-form question answering and contemporary
dialogue applications, challenge models for their ability to properly
consolidate partly-overlapping multi-text information. However, these tasks
entangle the consolidation phase with the often subjective and ill-defined
content selection requirement, impeding proper assessment of models'
consolidation capabilities. In this paper, we suggest revisiting the sentence
union generation task as an effective well-defined testbed for assessing text
consolidation capabilities, decoupling the consolidation challenge from
subjective content selection. To support research on this task, we present
refined annotation methodology and tools for crowdsourcing sentence union,
create the largest union dataset to date and provide an analysis of its rich
coverage of various consolidation aspects. We then propose a comprehensive
evaluation protocol for union generation, including both human and automatic
evaluation. Finally, as baselines, we evaluate state-of-the-art language models
on the task, along with a detailed analysis of their capacity to address
multi-text consolidation challenges and their limitations.
Related papers
- Multi-Review Fusion-in-Context [20.681734117825822]
Grounded text generation requires both content selection and content consolidation.
Recent works have proposed a modular approach, with separate components for each step.
This study lays the groundwork for further exploration of modular text generation in the multi-document setting.
arXiv Detail & Related papers (2024-03-22T17:06:05Z) - LongWanjuan: Towards Systematic Measurement for Long Text Quality [102.46517202896521]
LongWanjuan is a dataset specifically tailored to enhance the training of language models for long-text tasks with over 160B tokens.
In LongWanjuan, we categorize long texts into holistic, aggregated, and chaotic types, enabling a detailed analysis of long-text quality.
We devise a data mixture recipe that strategically balances different types of long texts within LongWanjuan, leading to significant improvements in model performance on long-text tasks.
arXiv Detail & Related papers (2024-02-21T07:27:18Z) - Multi-Dimensional Evaluation of Text Summarization with In-Context
Learning [79.02280189976562]
In this paper, we study the efficacy of large language models as multi-dimensional evaluators using in-context learning.
Our experiments show that in-context learning-based evaluators are competitive with learned evaluation frameworks for the task of text summarization.
We then analyze the effects of factors such as the selection and number of in-context examples on performance.
arXiv Detail & Related papers (2023-06-01T23:27:49Z) - Large Language Models are Diverse Role-Players for Summarization
Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal.
Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions.
We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z) - Controlled Text Reduction [15.102190738450092]
We formalize textitControlled Text Reduction as a standalone task.
A model then needs to generate a coherent text that includes all and only the target information.
arXiv Detail & Related papers (2022-10-24T17:59:03Z) - NEWTS: A Corpus for News Topic-Focused Summarization [9.872518517174498]
This paper introduces the first topical summarization corpus, based on the well-known CNN/Dailymail dataset.
We evaluate a range of existing techniques and analyze the effectiveness of different prompting methods.
arXiv Detail & Related papers (2022-05-31T10:01:38Z) - SNaC: Coherence Error Detection for Narrative Summarization [73.48220043216087]
We introduce SNaC, a narrative coherence evaluation framework rooted in fine-grained annotations for long summaries.
We develop a taxonomy of coherence errors in generated narrative summaries and collect span-level annotations for 6.6k sentences across 150 book and movie screenplay summaries.
Our work provides the first characterization of coherence errors generated by state-of-the-art summarization models and a protocol for eliciting coherence judgments from crowd annotators.
arXiv Detail & Related papers (2022-05-19T16:01:47Z) - Evaluation of Abstractive Summarisation Models with Machine Translation
in Deliberative Processes [23.249742737907905]
This dataset reflects difficulties of combining multiple narratives, mostly of poor grammatical quality, in a single text.
We report an extensive evaluation of a wide range of abstractive summarisation models in combination with an off-the-shelf machine translation model.
We obtain promising results regarding the fluency, consistency and relevance of the summaries produced.
arXiv Detail & Related papers (2021-10-12T09:23:57Z) - QA-Align: Representing Cross-Text Content Overlap by Aligning
Question-Answer Propositions [12.264795812337153]
We propose to align predicate-argument relations across texts, providing a scaffold for information consolidation.
Our setting exploits QA-SRL, utilizing question-answer pairs to capture predicate-argument relations.
Analyses show that our new task is semantically challenging, capturing content overlap beyond lexical similarity.
arXiv Detail & Related papers (2021-09-26T17:19:48Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.