Related papers: MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction

MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction

URL: http://arxiv.org/abs/2110.06651v1
Date: Wed, 13 Oct 2021 11:29:17 GMT
Title: MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction
Authors: Linhan Zhang, Qian Chen, Wen Wang, Chong Deng, Shiliang Zhang, Bing Li, Wei Wang, Xin Cao
Abstract summary: Keyphrases are phrases in a document providing a concise summary of core content, helping readers to understand what the article is talking about in a minute. We propose a novel unsupervised keyword extraction method by leveraging the BERT-based model to select and rank candidate keyphrases with a MASK strategy.
Score: 41.941098507759015
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Keyphrases are phrases in a document providing a concise summary of core content, helping readers to understand what the article is talking about in a minute. However, existing unsupervised works are not robust enough to handle various types of documents owing to the mismatch of sequence length for comparison. In this paper, we propose a novel unsupervised keyword extraction method by leveraging the BERT-based model to select and rank candidate keyphrases with a MASK strategy. In addition, we further enhance the model, denoted as Keyphrases Extraction BERT (KPEBERT), via designing a compatible self-supervised task and conducting a contrast learning. We conducted extensive experimental evaluation to demonstrate the superiority and robustness of the proposed method as well as the effectiveness of KPEBERT.

Related papers

Learning Refined Document Representations for Dense Retrieval via Deliberate Thinking [58.69615583599489]
Deliberate Thinking based Retriever (Debater) is a novel approach that enhances document representations by incorporating a step-by-step thinking process.<n>Debater significantly outperforms existing methods across several retrieval benchmarks.
arXiv Detail & Related papers (2025-02-18T15:56:34Z)
MetaKP: On-Demand Keyphrase Generation [52.48698290354449]
We introduce on-demand keyphrase generation, a novel paradigm that requires keyphrases that conform to specific high-level goals or intents. We present MetaKP, a large-scale benchmark comprising four datasets, 7500 documents, and 3760 goals across news and biomedical domains with human-annotated keyphrases. We demonstrate the potential of our method to serve as a general NLP infrastructure, exemplified by its application in epidemic event detection from social media.
arXiv Detail & Related papers (2024-06-28T19:02:59Z)
SimCKP: Simple Contrastive Learning of Keyphrase Representations [36.88517357720033]
We propose SimCKP, a simple contrastive learning framework that consists of two stages: 1) An extractor-generator that extracts keyphrases by learning context-aware phrase-level representations in a contrastive manner while also generating keyphrases that do not appear in the document; and 2) A reranker that adapts scores for each generated phrase by likewise aligning their representations with the corresponding document.
arXiv Detail & Related papers (2023-10-12T11:11:54Z)
Improving Keyphrase Extraction with Data Augmentation and Information Filtering [67.43025048639333]
Keyphrase extraction is one of the essential tasks for document understanding in NLP. We present a novel corpus and method for keyphrase extraction from the videos streamed on the Behance platform.
arXiv Detail & Related papers (2022-09-11T22:38:02Z)
Deep Keyphrase Completion [59.0413813332449]
Keyphrase provides accurate information of document content that is highly compact, concise, full of meanings, and widely used for discourse comprehension, organization, and text retrieval. We propose textitkeyphrase completion (KPC) to generate more keyphrases for document (e.g. scientific publication) taking advantage of document content along with a very limited number of known keyphrases. We name it textitdeep keyphrase completion (DKPC) since it attempts to capture the deep semantic meaning of the document content together with known keyphrases via a deep learning framework
arXiv Detail & Related papers (2021-10-29T07:15:35Z)
MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction [48.55908127994688]
We propose a novel key-value matching model based on a graph neural network for VIE (MatchVIE) Through key-value matching based on relevancy evaluation, the proposed MatchVIE can bypass the recognitions to various semantics. We introduce a simple but effective operation, Num2Vec, to tackle the instability of encoded values.
arXiv Detail & Related papers (2021-06-24T12:06:29Z)
UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction [20.26899340581431]
Keyphrase Prediction task aims at predicting several keyphrases that can summarize the main idea of the given document. Mainstream KP methods can be categorized into purely generative approaches and integrated models with extraction and generation. We propose UniKeyphrase, a novel end-to-end learning framework that jointly learns to extract and generate keyphrases.
arXiv Detail & Related papers (2021-06-09T07:09:51Z)
Unsupervised Deep Keyphrase Generation [14.544869226959612]
Keyphrase generation aims to summarize long documents with a collection of salient phrases. Deep neural models have demonstrated a remarkable success in this task, capable of predicting keyphrases that are even absent from a document. We present a novel method for keyphrase generation, AutoKeyGen, without the supervision of any human annotation.
arXiv Detail & Related papers (2021-04-18T05:53:19Z)
A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents [29.479331909227998]
Keyphrase extraction is the task of extracting a small set of phrases that best describe a document. Most existing benchmark datasets for the task typically have limited numbers of annotated documents. We propose a simple and efficient joint learning approach based on the idea of self-distillation.
arXiv Detail & Related papers (2020-10-22T18:36:31Z)
Context-Based Quotation Recommendation [60.93257124507105]
We propose a novel context-aware quote recommendation system. It generates a ranked list of quotable paragraphs and spans of tokens from a given source document. We conduct experiments on a collection of speech transcripts and associated news articles.
arXiv Detail & Related papers (2020-05-17T17:49:53Z)
Keyphrase Prediction With Pre-trained Language Model [16.06425973336514]
We propose to divide the keyphrase prediction into two subtasks, i.e., present keyphrase extraction (PKE) and absent keyphrase generation (AKG) For PKE, we tackle this task as a sequence labeling problem with the pre-trained language model BERT. For AKG, we introduce a Transformer-based architecture, which fully integrates the present keyphrase knowledge learned from PKE by the fine-tuned BERT.
arXiv Detail & Related papers (2020-04-22T09:35:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.