WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
- URL: http://arxiv.org/abs/2011.07832v1
- Date: Mon, 16 Nov 2020 10:02:52 GMT
- Title: WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
- Authors: Hiroaki Hayashi, Prashant Budania, Peng Wang, Chris Ackerson, Raj
Neervannan, Graham Neubig
- Abstract summary: We propose WikiAsp, a large-scale dataset for multi-domain aspect-based summarization.
Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation.
Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.
- Score: 69.13865812754058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aspect-based summarization is the task of generating focused summaries based
on specific points of interest. Such summaries aid efficient analysis of text,
such as quickly understanding reviews or opinions from different angles.
However, due to large differences in the type of aspects for different domains
(e.g., sentiment, product features), the development of previous models has
tended to be domain-specific. In this paper, we propose WikiAsp, a large-scale
dataset for multi-domain aspect-based summarization that attempts to spur
research in the direction of open-domain aspect-based summarization.
Specifically, we build the dataset using Wikipedia articles from 20 different
domains, using the section titles and boundaries of each article as a proxy for
aspect annotation. We propose several straightforward baseline models for this
task and conduct experiments on the dataset. Results highlight key challenges
that existing summarization models face in this setting, such as proper pronoun
handling of quoted sources and consistent explanation of time-sensitive events.
Related papers
- Language Guided Domain Generalized Medical Image Segmentation [68.93124785575739]
Single source domain generalization holds promise for more reliable and consistent image segmentation across real-world clinical settings.
We propose an approach that explicitly leverages textual information by incorporating a contrastive learning mechanism guided by the text encoder features.
Our approach achieves favorable performance against existing methods in literature.
arXiv Detail & Related papers (2024-04-01T17:48:15Z) - OpenAsp: A Benchmark for Multi-document Open Aspect-based Summarization [19.079053035229695]
We introduce OpenAsp, a benchmark for aspect-based summarization.
We show that the realistic open-aspect setting realized in OpenAsp poses a challenge for current state-of-the-art summarization models.
arXiv Detail & Related papers (2023-12-07T17:06:20Z) - SALUDA: Surface-based Automotive Lidar Unsupervised Domain Adaptation [62.889835139583965]
We introduce an unsupervised auxiliary task of learning an implicit underlying surface representation simultaneously on source and target data.
As both domains share the same latent representation, the model is forced to accommodate discrepancies between the two sources of data.
Our experiments demonstrate that our method achieves a better performance than the current state of the art, both in real-to-real and synthetic-to-real scenarios.
arXiv Detail & Related papers (2023-04-06T17:36:23Z) - OASum: Large-Scale Open Domain Aspect-based Summarization [29.45232847592956]
We take advantage of crowd-sourcing knowledge on Wikipedia.org and automatically create a high-quality, large-scale aspect-based summarization dataset named OASum.
OASum contains more than 3.7 million instances with around 1 million different aspects on 2 million Wikipedia pages.
To overcome the data scarcity problem on specific domains, we also perform zero-shot, few-shot, and fine-tuning on seven downstream datasets.
arXiv Detail & Related papers (2022-12-19T04:04:17Z) - Syntax-Guided Domain Adaptation for Aspect-based Sentiment Analysis [23.883810236153757]
Domain adaptation is a popular solution to alleviate the data deficiency issue in new domains by transferring common knowledge across domains.
We propose a novel Syntax-guided Domain Adaptation Model, named SDAM, for more effective cross-domain ABSA.
Our model consistently outperforms the state-of-the-art baselines with respect to Micro-F1 metric for the cross-domain End2End ABSA task.
arXiv Detail & Related papers (2022-11-10T10:09:33Z) - Unsupervised Summarization with Customized Granularities [76.26899748972423]
We propose the first unsupervised multi-granularity summarization framework, GranuSum.
By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner.
arXiv Detail & Related papers (2022-01-29T05:56:35Z) - Aspect-Oriented Summarization through Query-Focused Extraction [23.62412515574206]
Real users' needs often fall more closely into aspects, broad topics in a dataset the user is interested in rather than specific queries.
We benchmark extractive query-focused training schemes, and propose a contrastive augmentation approach to train the model.
We evaluate on two aspect-oriented datasets and find this approach yields focused summaries, better than those from a generic summarization system.
arXiv Detail & Related papers (2021-10-15T18:06:21Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural
Summarization Systems [121.78477833009671]
We investigate the performance of different summarization models under a cross-dataset setting.
A comprehensive study of 11 representative summarization systems on 5 datasets from different domains reveals the effect of model architectures and generation ways.
arXiv Detail & Related papers (2020-10-11T02:19:15Z) - Unsupervised Domain Adaptation in Semantic Segmentation: a Review [22.366638308792734]
The aim of this paper is to give an overview of the recent advancements in the Unsupervised Domain Adaptation (UDA) of deep networks for semantic segmentation.
arXiv Detail & Related papers (2020-05-21T20:10:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.