Text Summarization of Czech News Articles Using Named Entities
- URL: http://arxiv.org/abs/2104.10454v1
- Date: Wed, 21 Apr 2021 10:48:14 GMT
- Title: Text Summarization of Czech News Articles Using Named Entities
- Authors: Petr Marek, \v{S}t\v{e}p\'an M\"uller, Jakub Konr\'ad, Petr Lorenc,
Jan Pichl and Jan \v{S}ediv\'y
- Abstract summary: We focus on the impact of named entities on the summarization of Czech news articles.
We propose a new metric ROUGE_NE that measures the overlap of named entities between the true and generated summaries.
We show that it is still challenging for summarization systems to reach a high score in it.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The foundation for the research of summarization in the Czech language was
laid by the work of Straka et al. (2018). They published the SumeCzech, a large
Czech news-based summarization dataset, and proposed several baseline
approaches. However, it is clear from the achieved results that there is a
large space for improvement. In our work, we focus on the impact of named
entities on the summarization of Czech news articles. First, we annotate
SumeCzech with named entities. We propose a new metric ROUGE_NE that measures
the overlap of named entities between the true and generated summaries, and we
show that it is still challenging for summarization systems to reach a high
score in it. We propose an extractive summarization approach Named Entity
Density that selects a sentence with the highest ratio between a number of
entities and the length of the sentence as the summary of the article. The
experiments show that the proposed approach reached results close to the solid
baseline in the domain of news articles selecting the first sentence. Moreover,
we demonstrate that the selected sentence reflects the style of reports
concisely identifying to whom, when, where, and what happened. We propose that
such a summary is beneficial in combination with the first sentence of an
article in voice applications presenting news articles. We propose two
abstractive summarization approaches based on Seq2Seq architecture. The first
approach uses the tokens of the article. The second approach has access to the
named entity annotations. The experiments show that both approaches exceed
state-of-the-art results previously reported by Straka et al. (2018), with the
latter achieving slightly better results on SumeCzech's out-of-domain testing
set.
Related papers
- Incremental Extractive Opinion Summarization Using Cover Trees [81.59625423421355]
In online marketplaces user reviews accumulate over time, and opinion summaries need to be updated periodically.
In this work, we study the task of extractive opinion summarization in an incremental setting.
We present an efficient algorithm for accurately computing the CentroidRank summaries in an incremental setting.
arXiv Detail & Related papers (2024-01-16T02:00:17Z) - On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries.
Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens.
We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z) - Lay Text Summarisation Using Natural Language Processing: A Narrative
Literature Review [1.8899300124593648]
The aim of this literature review is to describe and compare the different text summarisation approaches used to generate lay summaries.
We screened 82 articles and included eight relevant papers published between 2020 and 2021, using the same dataset.
A combination of extractive and abstractive summarisation methods in a hybrid approach was found to be most effective.
arXiv Detail & Related papers (2023-03-24T18:30:50Z) - Salience Allocation as Guidance for Abstractive Summarization [61.31826412150143]
We propose a novel summarization approach with a flexible and reliable salience guidance, namely SEASON (SaliencE Allocation as Guidance for Abstractive SummarizatiON)
SEASON utilizes the allocation of salience expectation to guide abstractive summarization and adapts well to articles in different abstractiveness.
arXiv Detail & Related papers (2022-10-22T02:13:44Z) - Text Summarization with Oracle Expectation [88.39032981994535]
Extractive summarization produces summaries by identifying and concatenating the most important sentences in a document.
Most summarization datasets do not come with gold labels indicating whether document sentences are summary-worthy.
We propose a simple yet effective labeling algorithm that creates soft, expectation-based sentence labels.
arXiv Detail & Related papers (2022-09-26T14:10:08Z) - Comparing Methods for Extractive Summarization of Call Centre Dialogue [77.34726150561087]
We experimentally compare several such methods by using them to produce summaries of calls, and evaluating these summaries objectively.
We found that TopicSum and Lead-N outperform the other summarisation methods, whilst BERTSum received comparatively lower scores in both subjective and objective evaluations.
arXiv Detail & Related papers (2022-09-06T13:16:02Z) - Reinforcing Semantic-Symmetry for Document Summarization [15.113768658584979]
Document summarization condenses a long document into a short version with salient information and accurate semantic descriptions.
This paper introduces a new textbfreinforcing stextbfemantic-textbfsymmetry learning textbfmodel is proposed for document summarization.
A series of experiments have been conducted on two wildly used benchmark datasets CNN/Daily Mail and BigPatent.
arXiv Detail & Related papers (2021-12-14T17:41:37Z) - A Novel Two-stage Framework for Extracting Opinionated Sentences from
News Articles [24.528177249269582]
This paper presents a novel two-stage framework to extract opinionated sentences from a given news article.
In the first stage, Naive Bayes classifier by utilizing the local features assigns a score to each sentence.
In the second stage, we use this prior within the HITS (Hyperlink-Induced Topic Search) schema to exploit the global structure of the article.
arXiv Detail & Related papers (2021-01-24T16:24:20Z) - Bengali Abstractive News Summarization(BANS): A Neural Attention
Approach [0.8793721044482612]
We present a seq2seq based Long Short-Term Memory (LSTM) network model with attention at encoder-decoder.
Our proposed system deploys a local attention-based model that produces a long sequence of words with lucid and human-like generated sentences.
We also prepared a dataset of more than 19k articles and corresponding human-written summaries collected from bangla.bdnews24.com1.
arXiv Detail & Related papers (2020-12-03T08:17:31Z) - Discrete Optimization for Unsupervised Sentence Summarization with
Word-Level Extraction [31.648764677078837]
Automatic sentence summarization produces a shorter version of a sentence, while preserving its most important information.
We model these two aspects in an unsupervised objective function, consisting of language modeling and semantic similarity metrics.
Our proposed method achieves a new state-of-the art for unsupervised sentence summarization according to ROUGE scores.
arXiv Detail & Related papers (2020-05-04T19:01:55Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.