Evaluating Interactive Summarization: an Expansion-Based Framework
- URL: http://arxiv.org/abs/2009.08380v1
- Date: Thu, 17 Sep 2020 15:48:13 GMT
- Title: Evaluating Interactive Summarization: an Expansion-Based Framework
- Authors: Ori Shapira, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Yael
Amsterdamer, Ido Dagan
- Abstract summary: We develop an end-to-end evaluation framework for interactive summarization.
Our framework includes a procedure of collecting real user sessions and evaluation measures relying on standards.
All of our solutions are intended to be released publicly as a benchmark.
- Score: 97.0077722128397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Allowing users to interact with multi-document summarizers is a promising
direction towards improving and customizing summary results. Different ideas
for interactive summarization have been proposed in previous work but these
solutions are highly divergent and incomparable. In this paper, we develop an
end-to-end evaluation framework for expansion-based interactive summarization,
which considers the accumulating information along an interactive session. Our
framework includes a procedure of collecting real user sessions and evaluation
measures relying on standards, but adapted to reflect interaction. All of our
solutions are intended to be released publicly as a benchmark, allowing
comparison of future developments in interactive summarization. We demonstrate
the use of our framework by evaluating and comparing baseline implementations
that we developed for this purpose, which will serve as part of our benchmark.
Our extensive experimentation and analysis of these systems motivate our design
choices and support the viability of our framework.
Related papers
- Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs [29.72874725703848]
We introduce two concepts: Benchmark+, which extends traditional question-answer benchmark into a more flexible "strategy-criterion" format; and Assessment+, which enhances the interaction process.
We propose an agent-based evaluation framework called TestAgent, which implements these concepts through retrieval augmented generation and reinforcement learning.
arXiv Detail & Related papers (2024-10-15T11:20:42Z) - Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions [62.0123588983514]
Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields.
We reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers.
We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources.
arXiv Detail & Related papers (2024-06-09T08:24:17Z) - Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization [13.736656652049884]
Multimodal summarization aims to generate a concise summary based on the input text and image.
To evaluate the factuality of multimodal summarization models, we propose two fine-grained and explainable evaluation frameworks.
arXiv Detail & Related papers (2024-02-18T01:03:25Z) - Using Textual Interface to Align External Knowledge for End-to-End
Task-Oriented Dialogue Systems [53.38517204698343]
We propose a novel paradigm that uses a textual interface to align external knowledge and eliminate redundant processes.
We demonstrate our paradigm in practice through MultiWOZ-Remake, including an interactive textual interface built for the MultiWOZ database.
arXiv Detail & Related papers (2023-05-23T05:48:21Z) - Make The Most of Prior Data: A Solution for Interactive Text
Summarization with Preference Feedback [15.22874706089491]
We introduce a new framework to train summarization models with preference feedback interactively.
By properly leveraging offline data and a novel reward model, we improve the performance regarding ROUGE scores and sample-efficiency.
arXiv Detail & Related papers (2022-04-12T03:56:59Z) - FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment
Act Flows [63.116280145770006]
We propose segment act, an extension of dialog act from utterance level to segment level, and crowdsource a large-scale dataset for it.
To utilize segment act flows, sequences of segment acts, for evaluation, we develop the first consensus-based dialogue evaluation framework, FlowEval.
arXiv Detail & Related papers (2022-02-14T11:37:20Z) - Evaluating Bayesian Model Visualisations [0.39845810840390733]
Probabilistic models inform an increasingly broad range of business and policy decisions ultimately made by people.
Recent algorithmic, computational, and software framework development progress facilitate the proliferation of Bayesian probabilistic models.
While they can empower decision makers to explore complex queries and to perform what-if-style conditioning in theory, suitable visualisations and interactive tools are needed to maximise users' comprehension and rational decision making under uncertainty.
arXiv Detail & Related papers (2022-01-10T19:15:39Z) - iFacetSum: Coreference-based Interactive Faceted Summarization for
Multi-Document Exploration [63.272359227081836]
iFacetSum integrates interactive summarization together with faceted search.
Fine-grained facets are automatically produced based on cross-document coreference pipelines.
arXiv Detail & Related papers (2021-09-23T20:01:11Z) - Dialogue-Based Relation Extraction [53.2896545819799]
We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE.
We argue that speaker-related information plays a critical role in the proposed task, based on an analysis of similarities and differences between dialogue-based and traditional RE tasks.
Experimental results demonstrate that a speaker-aware extension on the best-performing model leads to gains in both the standard and conversational evaluation settings.
arXiv Detail & Related papers (2020-04-17T03:51:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.