Improving Keyphrase Extraction with Data Augmentation and Information
Filtering
- URL: http://arxiv.org/abs/2209.04951v1
- Date: Sun, 11 Sep 2022 22:38:02 GMT
- Title: Improving Keyphrase Extraction with Data Augmentation and Information
Filtering
- Authors: Amir Pouran Ben Veyseh, Nicole Meister, Franck Dernoncourt, Thien Huu
Nguyen
- Abstract summary: Keyphrase extraction is one of the essential tasks for document understanding in NLP.
We present a novel corpus and method for keyphrase extraction from the videos streamed on the Behance platform.
- Score: 67.43025048639333
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Keyphrase extraction is one of the essential tasks for document understanding
in NLP. While the majority of the prior works are dedicated to the formal
setting, e.g., books, news or web-blogs, informal texts such as video
transcripts are less explored. To address this limitation, in this work we
present a novel corpus and method for keyphrase extraction from the transcripts
of the videos streamed on the Behance platform. More specifically, in this
work, a novel data augmentation is proposed to enrich the model with the
background knowledge about the keyphrase extraction task from other domains.
Extensive experiments on the proposed dataset dataset show the effectiveness of
the introduced method.
Related papers
- Data Augmentation for Low-Resource Keyphrase Generation [46.52115499306222]
Keyphrase generation is the task of summarizing the contents of any given article into a few salient phrases (or keyphrases)
Existing works for the task mostly rely on large-scale annotated datasets, which are not easy to acquire.
We present data augmentation strategies specifically to address keyphrase generation in purely resource-constrained domains.
arXiv Detail & Related papers (2023-05-29T09:20:34Z) - PatternRank: Leveraging Pretrained Language Models and Part of Speech
for Unsupervised Keyphrase Extraction [0.6767885381740952]
We present PatternRank, which pretrained language models and part-of-speech for unsupervised keyphrase extraction from single documents.
Our experiments show PatternRank achieves higher precision, recall and F1-scores than previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-10-11T08:23:54Z) - Representation Learning for Resource-Constrained Keyphrase Generation [78.02577815973764]
We introduce salient span recovery and salient span prediction as guided denoising language modeling objectives.
We show the effectiveness of the proposed approach for low-resource and zero-shot keyphrase generation.
arXiv Detail & Related papers (2022-03-15T17:48:04Z) - Keyphrase Generation Beyond the Boundaries of Title and Abstract [28.56508031460787]
Keyphrase generation aims at generating phrases (keyphrases) that best describe a given document.
In this work, we explore whether the integration of additional data from semantically similar articles or from the full text of the given article can be helpful for a neural keyphrase generation model.
We discover that adding sentences from the full text particularly in the form of summary of the article can significantly improve the generation of both types of keyphrases.
arXiv Detail & Related papers (2021-12-13T16:33:01Z) - Enhancing Keyphrase Extraction from Academic Articles with their
Reference Information [12.769066804715697]
Keyphrases that summarize document information highly are helpful for users to quickly obtain and understand documents.
Title information in references also contains author-assigned keyphrases.
Experiments show reference information can increase precision, recall, and F1 of automatic keyphrase extraction.
arXiv Detail & Related papers (2021-11-28T11:14:16Z) - Deep Keyphrase Completion [59.0413813332449]
Keyphrase provides accurate information of document content that is highly compact, concise, full of meanings, and widely used for discourse comprehension, organization, and text retrieval.
We propose textitkeyphrase completion (KPC) to generate more keyphrases for document (e.g. scientific publication) taking advantage of document content along with a very limited number of known keyphrases.
We name it textitdeep keyphrase completion (DKPC) since it attempts to capture the deep semantic meaning of the document content together with known keyphrases via a deep learning framework
arXiv Detail & Related papers (2021-10-29T07:15:35Z) - Multi-Document Keyphrase Extraction: A Literature Review and the First
Dataset [24.91326715164367]
Multi-document keyphrase extraction has been infrequently studied, despite its utility for describing sets of documents.
We present here the first literature review and the first dataset for the task, MK-DUC-01, which can serve as a new benchmark.
arXiv Detail & Related papers (2021-10-03T19:10:28Z) - TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval [103.85002875155551]
We propose a novel generalized distillation method, TeachText, for exploiting large-scale language pretraining.
We extend our method to video side modalities and show that we can effectively reduce the number of used modalities at test time.
Our approach advances the state of the art on several video retrieval benchmarks by a significant margin and adds no computational overhead at test time.
arXiv Detail & Related papers (2021-04-16T17:55:28Z) - CUPID: Adaptive Curation of Pre-training Data for Video-and-Language
Representation Learning [49.18591896085498]
We propose CUPID to bridge the domain gap between source and target data.
CUPID yields new state-of-the-art performance across multiple video-language and video tasks.
arXiv Detail & Related papers (2021-04-01T06:42:16Z) - A Joint Learning Approach based on Self-Distillation for Keyphrase
Extraction from Scientific Documents [29.479331909227998]
Keyphrase extraction is the task of extracting a small set of phrases that best describe a document.
Most existing benchmark datasets for the task typically have limited numbers of annotated documents.
We propose a simple and efficient joint learning approach based on the idea of self-distillation.
arXiv Detail & Related papers (2020-10-22T18:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.