Topic-Centric Unsupervised Multi-Document Summarization of Scientific
and News Articles
- URL: http://arxiv.org/abs/2011.08072v1
- Date: Tue, 3 Nov 2020 04:04:21 GMT
- Title: Topic-Centric Unsupervised Multi-Document Summarization of Scientific
and News Articles
- Authors: Amanuel Alambo, Cori Lohstroh, Erik Madaus, Swati Padhee, Brandy
Foster, Tanvi Banerjee, Krishnaprasad Thirunarayan, Michael Raymer
- Abstract summary: We propose a topic-centric unsupervised multi-document summarization framework to generate abstractive summaries.
The proposed algorithm generates an abstractive summary by developing salient language unit selection and text generation techniques.
Our approach matches the state-of-the-art when evaluated on automated extractive evaluation metrics and performs better for abstractive summarization on five human evaluation metrics.
- Score: 3.0504782036247438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in natural language processing have enabled automation of a
wide range of tasks, including machine translation, named entity recognition,
and sentiment analysis. Automated summarization of documents, or groups of
documents, however, has remained elusive, with many efforts limited to
extraction of keywords, key phrases, or key sentences. Accurate abstractive
summarization has yet to be achieved due to the inherent difficulty of the
problem, and limited availability of training data. In this paper, we propose a
topic-centric unsupervised multi-document summarization framework to generate
extractive and abstractive summaries for groups of scientific articles across
20 Fields of Study (FoS) in Microsoft Academic Graph (MAG) and news articles
from DUC-2004 Task 2. The proposed algorithm generates an abstractive summary
by developing salient language unit selection and text generation techniques.
Our approach matches the state-of-the-art when evaluated on automated
extractive evaluation metrics and performs better for abstractive summarization
on five human evaluation metrics (entailment, coherence, conciseness,
readability, and grammar). We achieve a kappa score of 0.68 between two
co-author linguists who evaluated our results. We plan to publicly share
MAG-20, a human-validated gold standard dataset of topic-clustered research
articles and their summaries to promote research in abstractive summarization.
Related papers
- SKT5SciSumm -- Revisiting Extractive-Generative Approach for Multi-Document Scientific Summarization [24.051692189473723]
We propose SKT5SciSumm - a hybrid framework for multi-document scientific summarization (MDSS)
We leverage the Sentence-Transformer version of Scientific Paper Embeddings using Citation-Informed Transformers (SPECTER) to encode and represent textual sentences.
We employ the T5 family of models to generate abstractive summaries using extracted sentences.
arXiv Detail & Related papers (2024-02-27T08:33:31Z) - OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models [122.27878464009181]
We conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks.
OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available.
arXiv Detail & Related papers (2023-05-13T11:28:37Z) - Large Language Models are Diverse Role-Players for Summarization
Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal.
Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions.
We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z) - Lay Text Summarisation Using Natural Language Processing: A Narrative
Literature Review [1.8899300124593648]
The aim of this literature review is to describe and compare the different text summarisation approaches used to generate lay summaries.
We screened 82 articles and included eight relevant papers published between 2020 and 2021, using the same dataset.
A combination of extractive and abstractive summarisation methods in a hybrid approach was found to be most effective.
arXiv Detail & Related papers (2023-03-24T18:30:50Z) - Controllable Abstractive Dialogue Summarization with Sketch Supervision [56.59357883827276]
Our model achieves state-of-the-art performance on the largest dialogue summarization corpus SAMSum, with as high as 50.79 in ROUGE-L score.
arXiv Detail & Related papers (2021-05-28T19:05:36Z) - Bengali Abstractive News Summarization(BANS): A Neural Attention
Approach [0.8793721044482612]
We present a seq2seq based Long Short-Term Memory (LSTM) network model with attention at encoder-decoder.
Our proposed system deploys a local attention-based model that produces a long sequence of words with lucid and human-like generated sentences.
We also prepared a dataset of more than 19k articles and corresponding human-written summaries collected from bangla.bdnews24.com1.
arXiv Detail & Related papers (2020-12-03T08:17:31Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z) - GLEAKE: Global and Local Embedding Automatic Keyphrase Extraction [1.0681288493631977]
We introduce Global and Local Embedding Automatic Keyphrase Extractor (GLEAKE) for the task of automatic keyphrase extraction.
GLEAKE uses single and multi-word embedding techniques to explore the syntactic and semantic aspects of the candidate phrases.
It refines the most significant phrases as a final set of keyphrases.
arXiv Detail & Related papers (2020-05-19T20:24:02Z) - From Standard Summarization to New Tasks and Beyond: Summarization with
Manifold Information [77.89755281215079]
Text summarization is the research area aiming at creating a short and condensed version of the original document.
In real-world applications, most of the data is not in a plain text format.
This paper focuses on the survey of these new summarization tasks and approaches in the real-world application.
arXiv Detail & Related papers (2020-05-10T14:59:36Z) - Interpretable Multi-Headed Attention for Abstractive Summarization at
Controllable Lengths [14.762731718325002]
Multi-level Summarizer (MLS) is a supervised method to construct abstractive summaries of a text document at controllable lengths.
MLS outperforms strong baselines by up to 14.70% in the METEOR score.
arXiv Detail & Related papers (2020-02-18T19:40:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.