PatternRank: Leveraging Pretrained Language Models and Part of Speech
for Unsupervised Keyphrase Extraction
- URL: http://arxiv.org/abs/2210.05245v2
- Date: Wed, 12 Oct 2022 08:56:19 GMT
- Title: PatternRank: Leveraging Pretrained Language Models and Part of Speech
for Unsupervised Keyphrase Extraction
- Authors: Tim Schopf, Simon Klimek, Florian Matthes
- Abstract summary: We present PatternRank, which pretrained language models and part-of-speech for unsupervised keyphrase extraction from single documents.
Our experiments show PatternRank achieves higher precision, recall and F1-scores than previous state-of-the-art approaches.
- Score: 0.6767885381740952
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Keyphrase extraction is the process of automatically selecting a small set of
most relevant phrases from a given text. Supervised keyphrase extraction
approaches need large amounts of labeled training data and perform poorly
outside the domain of the training data. In this paper, we present PatternRank,
which leverages pretrained language models and part-of-speech for unsupervised
keyphrase extraction from single documents. Our experiments show PatternRank
achieves higher precision, recall and F1-scores than previous state-of-the-art
approaches. In addition, we present the KeyphraseVectorizers package, which
allows easy modification of part-of-speech patterns for candidate keyphrase
selection, and hence adaptation of our approach to any domain.
Related papers
- Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Large Language Models as Zero-Shot Keyphrase Extractors: A Preliminary
Empirical Study [27.139631284101007]
Zero-shot keyphrase extraction aims to build a keyphrase extractor without training by human-annotated data.
Recent efforts on pre-trained large language models show promising performance on zero-shot settings.
arXiv Detail & Related papers (2023-12-23T03:50:49Z) - Improving Keyphrase Extraction with Data Augmentation and Information
Filtering [67.43025048639333]
Keyphrase extraction is one of the essential tasks for document understanding in NLP.
We present a novel corpus and method for keyphrase extraction from the videos streamed on the Behance platform.
arXiv Detail & Related papers (2022-09-11T22:38:02Z) - Retrieval-Augmented Multilingual Keyphrase Generation with
Retriever-Generator Iterative Training [66.64843711515341]
Keyphrase generation is the task of automatically predicting keyphrases given a piece of long text.
We call attention to a new setting named multilingual keyphrase generation.
We propose a retrieval-augmented method for multilingual keyphrase generation to mitigate the data shortage problem in non-English languages.
arXiv Detail & Related papers (2022-05-21T00:45:21Z) - Representation Learning for Resource-Constrained Keyphrase Generation [78.02577815973764]
We introduce salient span recovery and salient span prediction as guided denoising language modeling objectives.
We show the effectiveness of the proposed approach for low-resource and zero-shot keyphrase generation.
arXiv Detail & Related papers (2022-03-15T17:48:04Z) - Learning Rich Representation of Keyphrases from Text [12.698835743464313]
We show how to learn task-specific language models aimed towards learning rich representation of keyphrases from text documents.
In the discriminative setting, we introduce a new pre-training objective - Keyphrase Boundary Infilling with Replacement (KBIR)
In the generative setting, we introduce a new pre-training setup for BART - KeyBART, that reproduces the keyphrases related to the input text in the CatSeq format.
arXiv Detail & Related papers (2021-12-16T01:09:51Z) - Template Controllable keywords-to-text Generation [16.255080737147384]
The model takes as input a set of un-ordered keywords, and part-of-speech (POS) based template instructions.
The framework is based on the encode-attend-decode paradigm, where keywords and templates are encoded first, and the decoder judiciously attends over the contexts derived from the encoded keywords and templates to generate the sentences.
arXiv Detail & Related papers (2020-11-07T08:05:58Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Pre-training via Paraphrasing [96.79972492585112]
We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual paraphrasing objective.
We show it is possible to jointly learn to do retrieval and reconstruction, given only a random initialization.
For example, with no additional task-specific training we achieve BLEU scores of up to 35.8 for document translation.
arXiv Detail & Related papers (2020-06-26T14:43:43Z) - Keyphrase Prediction With Pre-trained Language Model [16.06425973336514]
We propose to divide the keyphrase prediction into two subtasks, i.e., present keyphrase extraction (PKE) and absent keyphrase generation (AKG)
For PKE, we tackle this task as a sequence labeling problem with the pre-trained language model BERT.
For AKG, we introduce a Transformer-based architecture, which fully integrates the present keyphrase knowledge learned from PKE by the fine-tuned BERT.
arXiv Detail & Related papers (2020-04-22T09:35:02Z) - Keyphrase Extraction with Span-based Feature Representations [13.790461555410747]
Keyphrases are capable of providing semantic metadata characterizing documents.
Three approaches to address keyphrase extraction: (i) traditional two-step ranking method, (ii) sequence labeling and (iii) generation using neural networks.
In this paper, we propose a novelty Span Keyphrase Extraction model that extracts span-based feature representation of keyphrase directly from all the content tokens.
arXiv Detail & Related papers (2020-02-13T09:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.