Keyphrase Extraction with Span-based Feature Representations
- URL: http://arxiv.org/abs/2002.05407v1
- Date: Thu, 13 Feb 2020 09:48:31 GMT
- Title: Keyphrase Extraction with Span-based Feature Representations
- Authors: Funan Mu, Zhenting Yu, LiFeng Wang, Yequan Wang, Qingyu Yin, Yibo Sun,
Liqun Liu, Teng Ma, Jing Tang, Xing Zhou
- Abstract summary: Keyphrases are capable of providing semantic metadata characterizing documents.
Three approaches to address keyphrase extraction: (i) traditional two-step ranking method, (ii) sequence labeling and (iii) generation using neural networks.
In this paper, we propose a novelty Span Keyphrase Extraction model that extracts span-based feature representation of keyphrase directly from all the content tokens.
- Score: 13.790461555410747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Keyphrases are capable of providing semantic metadata characterizing
documents and producing an overview of the content of a document. Since
keyphrase extraction is able to facilitate the management, categorization, and
retrieval of information, it has received much attention in recent years. There
are three approaches to address keyphrase extraction: (i) traditional two-step
ranking method, (ii) sequence labeling and (iii) generation using neural
networks. Two-step ranking approach is based on feature engineering, which is
labor intensive and domain dependent. Sequence labeling is not able to tackle
overlapping phrases. Generation methods (i.e., Sequence-to-sequence neural
network models) overcome those shortcomings, so they have been widely studied
and gain state-of-the-art performance. However, generation methods can not
utilize context information effectively. In this paper, we propose a novelty
Span Keyphrase Extraction model that extracts span-based feature representation
of keyphrase directly from all the content tokens. In this way, our model
obtains representation for each keyphrase and further learns to capture the
interaction between keyphrases in one document to get better ranking results.
In addition, with the help of tokens, our model is able to extract overlapped
keyphrases. Experimental results on the benchmark datasets show that our
proposed model outperforms the existing methods by a large margin.
Related papers
- SimCKP: Simple Contrastive Learning of Keyphrase Representations [36.88517357720033]
We propose SimCKP, a simple contrastive learning framework that consists of two stages: 1) An extractor-generator that extracts keyphrases by learning context-aware phrase-level representations in a contrastive manner while also generating keyphrases that do not appear in the document; and 2) A reranker that adapts scores for each generated phrase by likewise aligning their representations with the corresponding document.
arXiv Detail & Related papers (2023-10-12T11:11:54Z) - Towards Better Multi-modal Keyphrase Generation via Visual Entity
Enhancement and Multi-granularity Image Noise Filtering [79.44443231700201]
Multi-modal keyphrase generation aims to produce a set of keyphrases that represent the core points of the input text-image pair.
The input text and image are often not perfectly matched, and thus the image may introduce noise into the model.
We propose a novel multi-modal keyphrase generation model, which not only enriches the model input with external knowledge, but also effectively filters image noise.
arXiv Detail & Related papers (2023-09-09T09:41:36Z) - Data Augmentation for Low-Resource Keyphrase Generation [46.52115499306222]
Keyphrase generation is the task of summarizing the contents of any given article into a few salient phrases (or keyphrases)
Existing works for the task mostly rely on large-scale annotated datasets, which are not easy to acquire.
We present data augmentation strategies specifically to address keyphrase generation in purely resource-constrained domains.
arXiv Detail & Related papers (2023-05-29T09:20:34Z) - Neural Keyphrase Generation: Analysis and Evaluation [47.004575377472285]
We study various tendencies exhibited by three strong models: T5 (based on a pre-trained transformer), CatSeq-Transformer (a non-pretrained Transformer), and ExHiRD (based on a recurrent neural network)
We propose a novel metric framework, SoftKeyScore, to evaluate the similarity between two sets of keyphrases.
arXiv Detail & Related papers (2023-04-27T00:10:21Z) - Applying Transformer-based Text Summarization for Keyphrase Generation [2.28438857884398]
Keyphrases are crucial for searching and systematizing scholarly documents.
In this paper, we experiment with popular transformer-based models for abstractive text summarization.
We show that summarization models are quite effective in generating keyphrases in the terms of the full-match F1-score and BERT.Score.
We also investigate several ordering strategies to target keyphrases.
arXiv Detail & Related papers (2022-09-08T13:01:52Z) - Importance Estimation from Multiple Perspectives for Keyphrase
Extraction [34.51718374923614]
We propose a new approach to estimate the importance of keyphrase from multiple perspectives (called as textitKIEMP)
textitKIEMP estimates the importance of phrase with three modules: a chunking module to measure its syntactic accuracy, a ranking module to check its information saliency, and a matching module to judge the concept consistency between phrase and the whole document.
Experimental results on six benchmark datasets show that textitKIEMP outperforms the existing state-of-the-art keyphrase extraction approaches in most cases.
arXiv Detail & Related papers (2021-10-19T05:48:22Z) - MatchVIE: Exploiting Match Relevancy between Entities for Visual
Information Extraction [48.55908127994688]
We propose a novel key-value matching model based on a graph neural network for VIE (MatchVIE)
Through key-value matching based on relevancy evaluation, the proposed MatchVIE can bypass the recognitions to various semantics.
We introduce a simple but effective operation, Num2Vec, to tackle the instability of encoded values.
arXiv Detail & Related papers (2021-06-24T12:06:29Z) - Phraseformer: Multimodal Key-phrase Extraction using Transformer and
Graph Embedding [3.7110020502717616]
We develop a multimodal Key-phrase extraction approach, namely Phraseformer, using transformer and graph embedding techniques.
In Phraseformer, each keyword candidate is presented by a vector which is the concatenation of the text and structure learning representations.
We analyze the performance of Phraseformer on three datasets including Inspec, SemEval2010 and SemEval 2017 by F1-score.
arXiv Detail & Related papers (2021-06-09T09:32:17Z) - Keyphrase Extraction with Dynamic Graph Convolutional Networks and
Diversified Inference [50.768682650658384]
Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document.
Recent Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks.
In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously.
arXiv Detail & Related papers (2020-10-24T08:11:23Z) - A Joint Learning Approach based on Self-Distillation for Keyphrase
Extraction from Scientific Documents [29.479331909227998]
Keyphrase extraction is the task of extracting a small set of phrases that best describe a document.
Most existing benchmark datasets for the task typically have limited numbers of annotated documents.
We propose a simple and efficient joint learning approach based on the idea of self-distillation.
arXiv Detail & Related papers (2020-10-22T18:36:31Z) - Select, Extract and Generate: Neural Keyphrase Generation with
Layer-wise Coverage Attention [75.44523978180317]
We propose emphSEG-Net, a neural keyphrase generation model that is composed of two major components.
The experimental results on seven keyphrase generation benchmarks from scientific and web documents demonstrate that SEG-Net outperforms the state-of-the-art neural generative methods by a large margin.
arXiv Detail & Related papers (2020-08-04T18:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.