Bengali Abstractive News Summarization(BANS): A Neural Attention
Approach
- URL: http://arxiv.org/abs/2012.01747v1
- Date: Thu, 3 Dec 2020 08:17:31 GMT
- Title: Bengali Abstractive News Summarization(BANS): A Neural Attention
Approach
- Authors: Prithwiraj Bhattacharjee, Avi Mallick, Md Saiful Islam,
Marium-E-Jannat
- Abstract summary: We present a seq2seq based Long Short-Term Memory (LSTM) network model with attention at encoder-decoder.
Our proposed system deploys a local attention-based model that produces a long sequence of words with lucid and human-like generated sentences.
We also prepared a dataset of more than 19k articles and corresponding human-written summaries collected from bangla.bdnews24.com1.
- Score: 0.8793721044482612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Abstractive summarization is the process of generating novel sentences based
on the information extracted from the original text document while retaining
the context. Due to abstractive summarization's underlying complexities, most
of the past research work has been done on the extractive summarization
approach. Nevertheless, with the triumph of the sequence-to-sequence (seq2seq)
model, abstractive summarization becomes more viable. Although a significant
number of notable research has been done in the English language based on
abstractive summarization, only a couple of works have been done on Bengali
abstractive news summarization (BANS). In this article, we presented a seq2seq
based Long Short-Term Memory (LSTM) network model with attention at
encoder-decoder. Our proposed system deploys a local attention-based model that
produces a long sequence of words with lucid and human-like generated sentences
with noteworthy information of the original document. We also prepared a
dataset of more than 19k articles and corresponding human-written summaries
collected from bangla.bdnews24.com1 which is till now the most extensive
dataset for Bengali news document summarization and publicly published in
Kaggle2. We evaluated our model qualitatively and quantitatively and compared
it with other published results. It showed significant improvement in terms of
human evaluation scores with state-of-the-art approaches for BANS.
Related papers
- Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback.
Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z) - Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization [48.57273563299046]
We propose the task of Stepwise Summarization, which aims to generate a new appended summary each time a new document is proposed.
The appended summary should not only summarize the newly added content but also be coherent with the previous summary.
We show that SSG achieves state-of-the-art performance in terms of both automatic metrics and human evaluations.
arXiv Detail & Related papers (2024-06-08T05:37:26Z) - From News to Summaries: Building a Hungarian Corpus for Extractive and Abstractive Summarization [0.19107347888374507]
HunSum-2 is an open-source Hungarian corpus suitable for training abstractive and extractive summarization models.
The dataset is assembled from segments of the Common Crawl corpus undergoing thorough cleaning.
arXiv Detail & Related papers (2024-04-04T16:07:06Z) - AugSumm: towards generalizable speech summarization using synthetic
labels from large language model [61.73741195292997]
Abstractive speech summarization (SSUM) aims to generate human-like summaries from speech.
conventional SSUM models are mostly trained and evaluated with a single ground-truth (GT) human-annotated deterministic summary.
We propose AugSumm, a method to leverage large language models (LLMs) as a proxy for human annotators to generate augmented summaries.
arXiv Detail & Related papers (2024-01-10T18:39:46Z) - Generating Multiple-Length Summaries via Reinforcement Learning for
Unsupervised Sentence Summarization [44.835811239393244]
Sentence summarization shortens given texts while maintaining core contents of the texts.
Unsupervised approaches have been studied to summarize texts without human-written summaries.
We devise an abstractive model based on reinforcement learning without ground-truth summaries.
arXiv Detail & Related papers (2022-12-21T08:34:28Z) - Salience Allocation as Guidance for Abstractive Summarization [61.31826412150143]
We propose a novel summarization approach with a flexible and reliable salience guidance, namely SEASON (SaliencE Allocation as Guidance for Abstractive SummarizatiON)
SEASON utilizes the allocation of salience expectation to guide abstractive summarization and adapts well to articles in different abstractiveness.
arXiv Detail & Related papers (2022-10-22T02:13:44Z) - RetrievalSum: A Retrieval Enhanced Framework for Abstractive
Summarization [25.434558112121778]
We propose a novel retrieval enhanced abstractive summarization framework consisting of a dense Retriever and a Summarizer.
We validate our method on a wide range of summarization datasets across multiple domains and two backbone models: BERT and BART.
Results show that our framework obtains significant improvement by 1.384.66 in ROUGE-1 score when compared with the powerful pre-trained models.
arXiv Detail & Related papers (2021-09-16T12:52:48Z) - StreamHover: Livestream Transcript Summarization and Annotation [54.41877742041611]
We present StreamHover, a framework for annotating and summarizing livestream transcripts.
With a total of over 500 hours of videos annotated with both extractive and abstractive summaries, our benchmark dataset is significantly larger than currently existing annotated corpora.
We show that our model generalizes better and improves performance over strong baselines.
arXiv Detail & Related papers (2021-09-11T02:19:37Z) - Text Summarization of Czech News Articles Using Named Entities [0.0]
We focus on the impact of named entities on the summarization of Czech news articles.
We propose a new metric ROUGE_NE that measures the overlap of named entities between the true and generated summaries.
We show that it is still challenging for summarization systems to reach a high score in it.
arXiv Detail & Related papers (2021-04-21T10:48:14Z) - Unsupervised Abstractive Summarization of Bengali Text Documents [0.5249805590164901]
We propose a graph-based unsupervised abstractive summarization system in the single-document setting for Bengali text documents.
We also provide a human-annotated dataset with document-summary pairs to evaluate our abstractive model and to support the comparison of future abstractive summarization systems of the Bengali Language.
arXiv Detail & Related papers (2021-01-26T11:41:28Z) - Pre-training for Abstractive Document Summarization by Reinstating
Source Text [105.77348528847337]
This paper presents three pre-training objectives which allow us to pre-train a Seq2Seq based abstractive summarization model on unlabeled text.
Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines.
arXiv Detail & Related papers (2020-04-04T05:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.