Cross-Domain Robustness of Transformer-based Keyphrase Generation
- URL: http://arxiv.org/abs/2312.10700v1
- Date: Sun, 17 Dec 2023 12:27:15 GMT
- Title: Cross-Domain Robustness of Transformer-based Keyphrase Generation
- Authors: Anna Glazkova and Dmitry Morozov
- Abstract summary: A list of keyphrases is an important element of a text in databases and repositories of electronic documents.
In our experiments, abstractive text summarization models fine-tuned for keyphrase generation show quite high results for a target text corpus.
We present an evaluation of the fine-tuned BART models for the keyphrase selection task across six benchmark corpora.
- Score: 1.8492669447784602
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern models for text generation show state-of-the-art results in many
natural language processing tasks. In this work, we explore the effectiveness
of abstractive text summarization models for keyphrase selection. A list of
keyphrases is an important element of a text in databases and repositories of
electronic documents. In our experiments, abstractive text summarization models
fine-tuned for keyphrase generation show quite high results for a target text
corpus. However, in most cases, the zero-shot performance on other corpora and
domains is significantly lower. We investigate cross-domain limitations of
abstractive text summarization models for keyphrase generation. We present an
evaluation of the fine-tuned BART models for the keyphrase selection task
across six benchmark corpora for keyphrase extraction including scientific
texts from two domains and news texts. We explore the role of transfer learning
between different domains to improve the BART model performance on small text
corpora. Our experiments show that preliminary fine-tuning on out-of-domain
corpora can be effective under conditions of a limited number of samples.
Related papers
- Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text [61.22649031769564]
We propose a novel framework, paraphrased text span detection (PTD)
PTD aims to identify paraphrased text spans within a text.
We construct a dedicated dataset, PASTED, for paraphrased text span detection.
arXiv Detail & Related papers (2024-05-21T11:22:27Z) - Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase
Generation Task [0.0]
Transformer-based language models, including ChatGPT, have demonstrated exceptional performance in various natural language generation tasks.
This study compares ChatGPT's keyphrase generation performance with state-of-the-art models, while also testing its potential as a solution for two significant challenges in the field.
arXiv Detail & Related papers (2023-04-27T13:25:43Z) - Keyword Extraction from Short Texts with~a~Text-To-Text Transfer
Transformer [0.0]
The paper explores the relevance of the Text-To-Text Transfer Transformer language model (T5) for Polish to the task of intrinsic and extrinsic keyword extraction from short text passages.
We compare the results obtained by four different methods, i.e. plT5kw, extremeText, TermoPL, KeyBERT and conclude that the plT5kw model yields particularly promising results for both frequent and sparsely represented keywords.
arXiv Detail & Related papers (2022-09-28T11:31:43Z) - Improving Keyphrase Extraction with Data Augmentation and Information
Filtering [67.43025048639333]
Keyphrase extraction is one of the essential tasks for document understanding in NLP.
We present a novel corpus and method for keyphrase extraction from the videos streamed on the Behance platform.
arXiv Detail & Related papers (2022-09-11T22:38:02Z) - Applying Transformer-based Text Summarization for Keyphrase Generation [2.28438857884398]
Keyphrases are crucial for searching and systematizing scholarly documents.
In this paper, we experiment with popular transformer-based models for abstractive text summarization.
We show that summarization models are quite effective in generating keyphrases in the terms of the full-match F1-score and BERT.Score.
We also investigate several ordering strategies to target keyphrases.
arXiv Detail & Related papers (2022-09-08T13:01:52Z) - Keyphrase Generation Beyond the Boundaries of Title and Abstract [28.56508031460787]
Keyphrase generation aims at generating phrases (keyphrases) that best describe a given document.
In this work, we explore whether the integration of additional data from semantically similar articles or from the full text of the given article can be helpful for a neural keyphrase generation model.
We discover that adding sentences from the full text particularly in the form of summary of the article can significantly improve the generation of both types of keyphrases.
arXiv Detail & Related papers (2021-12-13T16:33:01Z) - A Joint Learning Approach based on Self-Distillation for Keyphrase
Extraction from Scientific Documents [29.479331909227998]
Keyphrase extraction is the task of extracting a small set of phrases that best describe a document.
Most existing benchmark datasets for the task typically have limited numbers of annotated documents.
We propose a simple and efficient joint learning approach based on the idea of self-distillation.
arXiv Detail & Related papers (2020-10-22T18:36:31Z) - Exemplar-Controllable Paraphrasing and Translation using Bitext [57.92051459102902]
We adapt models from prior work to be able to learn solely from bilingual text (bitext)
Our single proposed model can perform four tasks: controlled paraphrase generation in both languages and controlled machine translation in both language directions.
arXiv Detail & Related papers (2020-10-12T17:02:50Z) - Select, Extract and Generate: Neural Keyphrase Generation with
Layer-wise Coverage Attention [75.44523978180317]
We propose emphSEG-Net, a neural keyphrase generation model that is composed of two major components.
The experimental results on seven keyphrase generation benchmarks from scientific and web documents demonstrate that SEG-Net outperforms the state-of-the-art neural generative methods by a large margin.
arXiv Detail & Related papers (2020-08-04T18:00:07Z) - Progressive Generation of Long Text with Pretrained Language Models [83.62523163717448]
Large-scale language models (LMs) pretrained on massive corpora of text, such as GPT-2, are powerful open-domain text generators.
It is still challenging for such models to generate coherent long passages of text, especially when the models are fine-tuned to the target domain on a small corpus.
We propose a simple but effective method of generating text in a progressive manner, inspired by generating images from low to high resolution.
arXiv Detail & Related papers (2020-06-28T21:23:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.