Unsupervised Abstractive Summarization of Bengali Text Documents
- URL: http://arxiv.org/abs/2102.04490v2
- Date: Fri, 19 Feb 2021 16:37:51 GMT
- Title: Unsupervised Abstractive Summarization of Bengali Text Documents
- Authors: Radia Rayan Chowdhury, Mir Tafseer Nayeem, Tahsin Tasnim Mim, Md.
Saifur Rahman Chowdhury, Taufiqul Jannat
- Abstract summary: We propose a graph-based unsupervised abstractive summarization system in the single-document setting for Bengali text documents.
We also provide a human-annotated dataset with document-summary pairs to evaluate our abstractive model and to support the comparison of future abstractive summarization systems of the Bengali Language.
- Score: 0.5249805590164901
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Abstractive summarization systems generally rely on large collections of
document-summary pairs. However, the performance of abstractive systems remains
a challenge due to the unavailability of parallel data for low-resource
languages like Bengali. To overcome this problem, we propose a graph-based
unsupervised abstractive summarization system in the single-document setting
for Bengali text documents, which requires only a Part-Of-Speech (POS) tagger
and a pre-trained language model trained on Bengali texts. We also provide a
human-annotated dataset with document-summary pairs to evaluate our abstractive
model and to support the comparison of future abstractive summarization systems
of the Bengali Language. We conduct experiments on this dataset and compare our
system with several well-established unsupervised extractive summarization
systems. Our unsupervised abstractive summarization model outperforms the
baselines without being exposed to any human-annotated reference summaries.
Related papers
- From News to Summaries: Building a Hungarian Corpus for Extractive and Abstractive Summarization [0.19107347888374507]
HunSum-2 is an open-source Hungarian corpus suitable for training abstractive and extractive summarization models.
The dataset is assembled from segments of the Common Crawl corpus undergoing thorough cleaning.
arXiv Detail & Related papers (2024-04-04T16:07:06Z) - Generating Multiple-Length Summaries via Reinforcement Learning for
Unsupervised Sentence Summarization [44.835811239393244]
Sentence summarization shortens given texts while maintaining core contents of the texts.
Unsupervised approaches have been studied to summarize texts without human-written summaries.
We devise an abstractive model based on reinforcement learning without ground-truth summaries.
arXiv Detail & Related papers (2022-12-21T08:34:28Z) - Unsupervised Summarization with Customized Granularities [76.26899748972423]
We propose the first unsupervised multi-granularity summarization framework, GranuSum.
By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner.
arXiv Detail & Related papers (2022-01-29T05:56:35Z) - EASE: Extractive-Abstractive Summarization with Explanations [18.046254486733186]
We present an explainable summarization system based on the Information Bottleneck principle.
Inspired by previous research that humans use a two-stage framework to summarize long documents, our framework first extracts a pre-defined amount of evidence spans as explanations.
We show that explanations from our framework are more relevant than simple baselines, without substantially sacrificing the quality of the generated summary.
arXiv Detail & Related papers (2021-05-14T17:45:06Z) - Unsupervised Opinion Summarization with Content Planning [58.5308638148329]
We show that explicitly incorporating content planning in a summarization model yields output of higher quality.
We also create synthetic datasets which are more natural, resembling real world document-summary pairs.
Our approach outperforms competitive models in generating informative, coherent, and fluent summaries.
arXiv Detail & Related papers (2020-12-14T18:41:58Z) - Bengali Abstractive News Summarization(BANS): A Neural Attention
Approach [0.8793721044482612]
We present a seq2seq based Long Short-Term Memory (LSTM) network model with attention at encoder-decoder.
Our proposed system deploys a local attention-based model that produces a long sequence of words with lucid and human-like generated sentences.
We also prepared a dataset of more than 19k articles and corresponding human-written summaries collected from bangla.bdnews24.com1.
arXiv Detail & Related papers (2020-12-03T08:17:31Z) - Liputan6: A Large-scale Indonesian Dataset for Text Summarization [43.375797352517765]
We harvest articles from Liputan6.com, an online news portal, and obtain 215,827 document-summary pairs.
We leverage pre-trained language models to develop benchmark extractive and abstractive summarization methods over the dataset.
arXiv Detail & Related papers (2020-11-02T02:01:12Z) - Multi-Fact Correction in Abstractive Text Summarization [98.27031108197944]
Span-Fact is a suite of two factual correction models that leverages knowledge learned from question answering models to make corrections in system-generated summaries via span selection.
Our models employ single or multi-masking strategies to either iteratively or auto-regressively replace entities in order to ensure semantic consistency w.r.t. the source text.
Experiments show that our models significantly boost the factual consistency of system-generated summaries without sacrificing summary quality in terms of both automatic metrics and human evaluation.
arXiv Detail & Related papers (2020-10-06T02:51:02Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z) - Unsupervised Opinion Summarization with Noising and Denoising [85.49169453434554]
We create a synthetic dataset from a corpus of user reviews by sampling a review, pretending it is a summary, and generating noisy versions thereof.
At test time, the model accepts genuine reviews and generates a summary containing salient opinions, treating those that do not reach consensus as noise.
arXiv Detail & Related papers (2020-04-21T16:54:57Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.