Capturing Global Informativeness in Open Domain Keyphrase Extraction
- URL: http://arxiv.org/abs/2004.13639v2
- Date: Fri, 17 Sep 2021 09:37:12 GMT
- Title: Capturing Global Informativeness in Open Domain Keyphrase Extraction
- Authors: Si Sun, Zhenghao Liu, Chenyan Xiong, Zhiyuan Liu, Jie Bao
- Abstract summary: Open-domain KeyPhrase Extraction (KPE) aims to extract keyphrases from documents without domain or quality restrictions.
This paper presents JointKPE, an open-domain KPE architecture built on pre-trained language models.
JointKPE learns to rank keyphrases by estimating their informativeness in the entire document and is jointly trained on the keyphrase chunking task.
- Score: 40.57116173502994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-domain KeyPhrase Extraction (KPE) aims to extract keyphrases from
documents without domain or quality restrictions, e.g., web pages with variant
domains and qualities. Recently, neural methods have shown promising results in
many KPE tasks due to their powerful capacity for modeling contextual semantics
of the given documents. However, we empirically show that most neural KPE
methods prefer to extract keyphrases with good phraseness, such as short and
entity-style n-grams, instead of globally informative keyphrases from
open-domain documents. This paper presents JointKPE, an open-domain KPE
architecture built on pre-trained language models, which can capture both local
phraseness and global informativeness when extracting keyphrases. JointKPE
learns to rank keyphrases by estimating their informativeness in the entire
document and is jointly trained on the keyphrase chunking task to guarantee the
phraseness of keyphrase candidates. Experiments on two large KPE datasets with
diverse domains, OpenKP and KP20k, demonstrate the effectiveness of JointKPE on
different pre-trained variants in open-domain scenarios. Further analyses
reveal the significant advantages of JointKPE in predicting long and non-entity
keyphrases, which are challenging for previous neural KPE methods. Our code is
publicly available at https://github.com/thunlp/BERT-KPE.
Related papers
- Pre-Trained Language Models for Keyphrase Prediction: A Review [2.7869482272876622]
Keyphrase Prediction (KP) is essential for identifying keyphrases in a document that can summarize its content.
Recent Natural Language Processing advances have developed more efficient KP models using deep learning techniques.
This paper extensively examines the topic of pre-trained language models for keyphrase prediction (PLM-KP)
arXiv Detail & Related papers (2024-09-02T09:15:44Z) - Enhancing Phrase Representation by Information Bottleneck Guided Text Diffusion Process for Keyphrase Extraction [9.307602861891926]
Keyphrase extraction is an important task in Natural Language Processing.
In this study, we propose Diff-KPE to guide the text diffusion process for generating enhanced keyphrase representations.
Experiments show that Diff-KPE outperforms existing KPE methods on a large open domain keyphrase extraction benchmark, OpenKP, and a scientific domain dataset, KP20K.
arXiv Detail & Related papers (2023-08-17T02:26:30Z) - Enriching Relation Extraction with OpenIE [70.52564277675056]
Relation extraction (RE) is a sub-discipline of information extraction (IE)
In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE.
Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models.
arXiv Detail & Related papers (2022-12-19T11:26:23Z) - LDKP: A Dataset for Identifying Keyphrases from Long Scientific
Documents [48.84086818702328]
Identifying keyphrases (KPs) from text documents is a fundamental task in natural language processing and information retrieval.
Vast majority of the benchmark datasets for this task are from the scientific domain containing only the document title and abstract information.
This presents three challenges for real-world applications: human-written summaries are unavailable for most documents, the documents are almost always long, and a high percentage of KPs are directly found beyond the limited context of title and abstract.
arXiv Detail & Related papers (2022-03-29T08:44:57Z) - Unsupervised Keyphrase Extraction via Interpretable Neural Networks [27.774524511005172]
Keyphrases that are most useful for predicting the topic of a text are important keyphrases.
InSPECT is a self-explaining neural framework for identifying influential keyphrases.
We show that INSPECT achieves state-of-the-art results in unsupervised key extraction across four diverse datasets.
arXiv Detail & Related papers (2022-03-15T04:30:47Z) - Deep Keyphrase Completion [59.0413813332449]
Keyphrase provides accurate information of document content that is highly compact, concise, full of meanings, and widely used for discourse comprehension, organization, and text retrieval.
We propose textitkeyphrase completion (KPC) to generate more keyphrases for document (e.g. scientific publication) taking advantage of document content along with a very limited number of known keyphrases.
We name it textitdeep keyphrase completion (DKPC) since it attempts to capture the deep semantic meaning of the document content together with known keyphrases via a deep learning framework
arXiv Detail & Related papers (2021-10-29T07:15:35Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - UniKeyphrase: A Unified Extraction and Generation Framework for
Keyphrase Prediction [20.26899340581431]
Keyphrase Prediction task aims at predicting several keyphrases that can summarize the main idea of the given document.
Mainstream KP methods can be categorized into purely generative approaches and integrated models with extraction and generation.
We propose UniKeyphrase, a novel end-to-end learning framework that jointly learns to extract and generate keyphrases.
arXiv Detail & Related papers (2021-06-09T07:09:51Z) - Keyphrase Extraction with Dynamic Graph Convolutional Networks and
Diversified Inference [50.768682650658384]
Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document.
Recent Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks.
In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously.
arXiv Detail & Related papers (2020-10-24T08:11:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.