Guided Exploration of Data Summaries
- URL: http://arxiv.org/abs/2205.13956v1
- Date: Fri, 27 May 2022 13:06:27 GMT
- Title: Guided Exploration of Data Summaries
- Authors: Brit Youngmann, Sihem Amer-Yahia, and Aur\'elien Personnaz
- Abstract summary: A useful summary contains k individually uniform sets that are collectively diverse to be representative.
Finding such as summary is a difficult task when data is highly diverse and large.
We examine the applicability of Exploratory Data Analysis (EDA) to data summarization and formalize Eda4Sum.
- Score: 24.16170440895994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data summarization is the process of producing interpretable and
representative subsets of an input dataset. It is usually performed following a
one-shot process with the purpose of finding the best summary. A useful summary
contains k individually uniform sets that are collectively diverse to be
representative. Uniformity addresses interpretability and diversity addresses
representativity. Finding such as summary is a difficult task when data is
highly diverse and large. We examine the applicability of Exploratory Data
Analysis (EDA) to data summarization and formalize Eda4Sum, the problem of
guided exploration of data summaries that seeks to sequentially produce
connected summaries with the goal of maximizing their cumulative utility.
EdA4Sum generalizes one-shot summarization. We propose to solve it with one of
two approaches: (i) Top1Sum which chooses the most useful summary at each step;
(ii) RLSum which trains a policy with Deep Reinforcement Learning that rewards
an agent for finding a diverse and new collection of uniform sets at each step.
We compare these approaches with one-shot summarization and top-performing EDA
solutions. We run extensive experiments on three large datasets. Our results
demonstrate the superiority of our approaches for summarizing very large data,
and the need to provide guidance to domain experts.
Related papers
- Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback.
Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z) - Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles [136.84278943588652]
We propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event.
To facilitate this task, we outlined a data collection schema for identifying diverse information and curated a dataset named DiverseSumm.
The dataset includes 245 news stories, with each story comprising 10 news articles and paired with a human-validated reference.
arXiv Detail & Related papers (2023-09-17T20:28:17Z) - Align and Attend: Multimodal Summarization with Dual Contrastive Losses [57.83012574678091]
The goal of multimodal summarization is to extract the most important information from different modalities to form output summaries.
Existing methods fail to leverage the temporal correspondence between different modalities and ignore the intrinsic correlation between different samples.
We introduce Align and Attend Multimodal Summarization (A2Summ), a unified multimodal transformer-based model which can effectively align and attend the multimodal input.
arXiv Detail & Related papers (2023-03-13T17:01:42Z) - UniSumm and SummZoo: Unified Model and Diverse Benchmark for Few-Shot
Summarization [54.59104881168188]
textscUniSumm is a unified few-shot summarization model pre-trained with multiple summarization tasks.
textscSummZoo is a new benchmark to better evaluate few-shot summarizers.
arXiv Detail & Related papers (2022-11-17T18:54:47Z) - EntSUM: A Data Set for Entity-Centric Summarization [27.845014142019917]
Controllable summarization aims to provide summaries that take into account user-specified aspects and preferences.
We introduce a human-annotated data setSUM for controllable summarization with a focus on named entities as the aspects to control.
arXiv Detail & Related papers (2022-04-05T13:45:54Z) - Abstractive Query Focused Summarization with Query-Free Resources [60.468323530248945]
In this work, we consider the problem of leveraging only generic summarization resources to build an abstractive QFS system.
We propose Marge, a Masked ROUGE Regression framework composed of a novel unified representation for summaries and queries.
Despite learning from minimal supervision, our system achieves state-of-the-art results in the distantly supervised setting.
arXiv Detail & Related papers (2020-12-29T14:39:35Z) - SupMMD: A Sentence Importance Model for Extractive Summarization using
Maximum Mean Discrepancy [92.5683788430012]
SupMMD is a novel technique for generic and update summarization based on the maximum discrepancy from kernel two-sample testing.
We show the efficacy of SupMMD in both generic and update summarization tasks by meeting or exceeding the current state-of-the-art on the DUC-2004 and TAC-2009 datasets.
arXiv Detail & Related papers (2020-10-06T09:26:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.