WSL-DS: Weakly Supervised Learning with Distant Supervision for Query
Focused Multi-Document Abstractive Summarization
- URL: http://arxiv.org/abs/2011.01421v1
- Date: Tue, 3 Nov 2020 02:02:55 GMT
- Title: WSL-DS: Weakly Supervised Learning with Distant Supervision for Query
Focused Multi-Document Abstractive Summarization
- Authors: Md Tahmid Rahman Laskar, Enamul Hoque, Jimmy Xiangji Huang
- Abstract summary: In the Query Focused Multi-Document Summarization (QF-MDS) task, a set of documents and a query are given where the goal is to generate a summary from these documents.
One major challenge for this task is the lack of availability of labeled training datasets.
We propose a novel weakly supervised learning approach via utilizing distant supervision.
- Score: 16.048329028104643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the Query Focused Multi-Document Summarization (QF-MDS) task, a set of
documents and a query are given where the goal is to generate a summary from
these documents based on the given query. However, one major challenge for this
task is the lack of availability of labeled training datasets. To overcome this
issue, in this paper, we propose a novel weakly supervised learning approach
via utilizing distant supervision. In particular, we use datasets similar to
the target dataset as the training data where we leverage pre-trained sentence
similarity models to generate the weak reference summary of each individual
document in a document set from the multi-document gold reference summaries.
Then, we iteratively train our summarization model on each single-document to
alleviate the computational complexity issue that occurs while training neural
summarization models in multiple documents (i.e., long sequences) at once.
Experimental results in Document Understanding Conferences (DUC) datasets show
that our proposed approach sets a new state-of-the-art result in terms of
various evaluation metrics.
Related papers
- Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization [48.57273563299046]
We propose the task of Stepwise Summarization, which aims to generate a new appended summary each time a new document is proposed.
The appended summary should not only summarize the newly added content but also be coherent with the previous summary.
We show that SSG achieves state-of-the-art performance in terms of both automatic metrics and human evaluations.
arXiv Detail & Related papers (2024-06-08T05:37:26Z) - The Power of Summary-Source Alignments [62.76959473193149]
Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection.
alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data.
This paper proposes extending the summary-source alignment framework by applying it at the more fine-grained proposition span level.
arXiv Detail & Related papers (2024-06-02T19:35:19Z) - On Task-personalized Multimodal Few-shot Learning for Visually-rich
Document Entity Retrieval [59.25292920967197]
Few-shot document entity retrieval (VDER) is an important topic in industrial NLP applications.
FewVEX is a new dataset to boost future research in the field of entity-level few-shot VDER.
We present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization.
arXiv Detail & Related papers (2023-11-01T17:51:43Z) - Absformer: Transformer-based Model for Unsupervised Multi-Document
Abstractive Summarization [1.066048003460524]
Multi-document summarization (MDS) refers to the task of summarizing the text in multiple documents into a concise summary.
Abstractive MDS aims to generate a coherent and fluent summary for multiple documents using natural language generation techniques.
We propose Absformer, a new Transformer-based method for unsupervised abstractive summary generation.
arXiv Detail & Related papers (2023-06-07T21:18:23Z) - Peek Across: Improving Multi-Document Modeling via Cross-Document
Question-Answering [49.85790367128085]
We pre-training a generic multi-document model from a novel cross-document question answering pre-training objective.
This novel multi-document QA formulation directs the model to better recover cross-text informational relations.
Unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation and long text generation.
arXiv Detail & Related papers (2023-05-24T17:48:40Z) - How "Multi" is Multi-Document Summarization? [15.574673241564932]
It is expected that both reference summaries in MDS datasets, as well as system summaries, would indeed be based on dispersed information.
We propose an automated measure for evaluating the degree to which a summary is disperse''
Our results show that certain MDS datasets barely require combining information from multiple documents, where a single document often covers the full summary content.
arXiv Detail & Related papers (2022-10-23T10:20:09Z) - ACM -- Attribute Conditioning for Abstractive Multi Document
Summarization [0.0]
We propose a model that incorporates attribute conditioning modules in order to decouple conflicting information by conditioning for a certain attribute in the output summary.
This approach shows strong gains in ROUGE score over baseline multi document summarization approaches.
arXiv Detail & Related papers (2022-05-09T00:00:14Z) - Abstractive Query Focused Summarization with Query-Free Resources [60.468323530248945]
In this work, we consider the problem of leveraging only generic summarization resources to build an abstractive QFS system.
We propose Marge, a Masked ROUGE Regression framework composed of a novel unified representation for summaries and queries.
Despite learning from minimal supervision, our system achieves state-of-the-art results in the distantly supervised setting.
arXiv Detail & Related papers (2020-12-29T14:39:35Z) - AQuaMuSe: Automatically Generating Datasets for Query-Based
Multi-Document Summarization [17.098075160558576]
We propose a scalable approach called AQuaMuSe to automatically mine qMDS examples from question answering datasets and large document corpora.
We publicly release a specific instance of an AQuaMuSe dataset with 5,519 query-based summaries, each associated with an average of 6 input documents selected from an index of 355M documents from Common Crawl.
arXiv Detail & Related papers (2020-10-23T22:38:18Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z) - A Divide-and-Conquer Approach to the Summarization of Long Documents [4.863209463405628]
We present a novel divide-and-conquer method for the neural summarization of long documents.
Our method exploits the discourse structure of the document and uses sentence similarity to split the problem into smaller summarization problems.
We demonstrate that this approach paired with different summarization models, including sequence-to-sequence RNNs and Transformers, can lead to improved summarization performance.
arXiv Detail & Related papers (2020-04-13T20:38:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.