DeepTitle -- Leveraging BERT to generate Search Engine Optimized
Headlines
- URL: http://arxiv.org/abs/2107.10935v1
- Date: Thu, 22 Jul 2021 21:32:54 GMT
- Title: DeepTitle -- Leveraging BERT to generate Search Engine Optimized
Headlines
- Authors: Cristian Anastasiu and Hanna Behnke and Sarah L\"uck and Viktor
Malesevic and Aamna Najmi and Javier Poveda-Panter
- Abstract summary: We showcase how a pre-trained language model can be leveraged to create an abstractive news headline generator for German language.
We incorporate state of the art fine-tuning techniques for abstractive text summarization, i.e. we use different baits for the encoder and decoder.
We conduct experiments on a German news data set and achieve a ROUGE-L-gram F-score of 40.02.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Automated headline generation for online news articles is not a trivial task
- machine generated titles need to be grammatically correct, informative,
capture attention and generate search traffic without being "click baits" or
"fake news". In this paper we showcase how a pre-trained language model can be
leveraged to create an abstractive news headline generator for German language.
We incorporate state of the art fine-tuning techniques for abstractive text
summarization, i.e. we use different optimizers for the encoder and decoder
where the former is pre-trained and the latter is trained from scratch. We
modify the headline generation to incorporate frequently sought keywords
relevant for search engine optimization. We conduct experiments on a German
news data set and achieve a ROUGE-L-gram F-score of 40.02. Furthermore, we
address the limitations of ROUGE for measuring the quality of text
summarization by introducing a sentence similarity metric and human evaluation.
Related papers
- Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs.
We propose a more realistic setting in which only noisy text and its NER labels are available.
We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z) - Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder
for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE)
It encodes the text corpus into a latent space, capturing current and future information from both source and target text.
Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - DSGPT: Domain-Specific Generative Pre-Training of Transformers for Text
Generation in E-commerce Title and Review Summarization [14.414693156937782]
We propose a novel domain-specific generative pre-training (DS-GPT) method for text generation.
We apply it to the product titleand review summarization problems on E-commerce mobile display.
arXiv Detail & Related papers (2021-12-15T19:02:49Z) - Keyphrase Generation Beyond the Boundaries of Title and Abstract [28.56508031460787]
Keyphrase generation aims at generating phrases (keyphrases) that best describe a given document.
In this work, we explore whether the integration of additional data from semantically similar articles or from the full text of the given article can be helpful for a neural keyphrase generation model.
We discover that adding sentences from the full text particularly in the form of summary of the article can significantly improve the generation of both types of keyphrases.
arXiv Detail & Related papers (2021-12-13T16:33:01Z) - Domain Controlled Title Generation with Human Evaluation [2.5505887482902287]
A good title allows you to get the attention that your research deserves.
For domain-controlled titles, we used the pre-trained text-to-text transformer model and the additional token technique.
Title tokens are sampled from a local distribution (which is a subset of global vocabulary) of the domain-specific vocabulary and not global vocabulary.
arXiv Detail & Related papers (2021-03-08T20:55:55Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z) - Automatically Ranked Russian Paraphrase Corpus for Text Generation [0.0]
The article is focused on automatic development and ranking of a large corpus for Russian paraphrase generation.
Existing manually annotated paraphrase datasets for Russian are limited to small-sized ParaPhraser corpus and ParaPlag.
arXiv Detail & Related papers (2020-06-17T08:40:52Z) - GLEAKE: Global and Local Embedding Automatic Keyphrase Extraction [1.0681288493631977]
We introduce Global and Local Embedding Automatic Keyphrase Extractor (GLEAKE) for the task of automatic keyphrase extraction.
GLEAKE uses single and multi-word embedding techniques to explore the syntactic and semantic aspects of the candidate phrases.
It refines the most significant phrases as a final set of keyphrases.
arXiv Detail & Related papers (2020-05-19T20:24:02Z) - Context-Based Quotation Recommendation [60.93257124507105]
We propose a novel context-aware quote recommendation system.
It generates a ranked list of quotable paragraphs and spans of tokens from a given source document.
We conduct experiments on a collection of speech transcripts and associated news articles.
arXiv Detail & Related papers (2020-05-17T17:49:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.