Data Augmentation for Low-Resource Keyphrase Generation
- URL: http://arxiv.org/abs/2305.17968v1
- Date: Mon, 29 May 2023 09:20:34 GMT
- Title: Data Augmentation for Low-Resource Keyphrase Generation
- Authors: Krishna Garg, Jishnu Ray Chowdhury, Cornelia Caragea
- Abstract summary: Keyphrase generation is the task of summarizing the contents of any given article into a few salient phrases (or keyphrases)
Existing works for the task mostly rely on large-scale annotated datasets, which are not easy to acquire.
We present data augmentation strategies specifically to address keyphrase generation in purely resource-constrained domains.
- Score: 46.52115499306222
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Keyphrase generation is the task of summarizing the contents of any given
article into a few salient phrases (or keyphrases). Existing works for the task
mostly rely on large-scale annotated datasets, which are not easy to acquire.
Very few works address the problem of keyphrase generation in low-resource
settings, but they still rely on a lot of additional unlabeled data for
pretraining and on automatic methods for pseudo-annotations. In this paper, we
present data augmentation strategies specifically to address keyphrase
generation in purely resource-constrained domains. We design techniques that
use the full text of the articles to improve both present and absent keyphrase
generation. We test our approach comprehensively on three datasets and show
that the data augmentation strategies consistently improve the state-of-the-art
performance. We release our source code at
https://github.com/kgarg8/kpgen-lowres-data-aug.
Related papers
- Self-Compositional Data Augmentation for Scientific Keyphrase Generation [28.912937922090038]
We present a self-compositional data augmentation method for keyphrase generation.
We measure the relatedness of training documents based on their shared keyphrases, and combine similar documents to generate synthetic samples.
arXiv Detail & Related papers (2024-11-05T12:22:51Z) - Improving Keyphrase Extraction with Data Augmentation and Information
Filtering [67.43025048639333]
Keyphrase extraction is one of the essential tasks for document understanding in NLP.
We present a novel corpus and method for keyphrase extraction from the videos streamed on the Behance platform.
arXiv Detail & Related papers (2022-09-11T22:38:02Z) - Retrieval-Augmented Multilingual Keyphrase Generation with
Retriever-Generator Iterative Training [66.64843711515341]
Keyphrase generation is the task of automatically predicting keyphrases given a piece of long text.
We call attention to a new setting named multilingual keyphrase generation.
We propose a retrieval-augmented method for multilingual keyphrase generation to mitigate the data shortage problem in non-English languages.
arXiv Detail & Related papers (2022-05-21T00:45:21Z) - Representation Learning for Resource-Constrained Keyphrase Generation [78.02577815973764]
We introduce salient span recovery and salient span prediction as guided denoising language modeling objectives.
We show the effectiveness of the proposed approach for low-resource and zero-shot keyphrase generation.
arXiv Detail & Related papers (2022-03-15T17:48:04Z) - Keyphrase Generation Beyond the Boundaries of Title and Abstract [28.56508031460787]
Keyphrase generation aims at generating phrases (keyphrases) that best describe a given document.
In this work, we explore whether the integration of additional data from semantically similar articles or from the full text of the given article can be helpful for a neural keyphrase generation model.
We discover that adding sentences from the full text particularly in the form of summary of the article can significantly improve the generation of both types of keyphrases.
arXiv Detail & Related papers (2021-12-13T16:33:01Z) - Deep Keyphrase Completion [59.0413813332449]
Keyphrase provides accurate information of document content that is highly compact, concise, full of meanings, and widely used for discourse comprehension, organization, and text retrieval.
We propose textitkeyphrase completion (KPC) to generate more keyphrases for document (e.g. scientific publication) taking advantage of document content along with a very limited number of known keyphrases.
We name it textitdeep keyphrase completion (DKPC) since it attempts to capture the deep semantic meaning of the document content together with known keyphrases via a deep learning framework
arXiv Detail & Related papers (2021-10-29T07:15:35Z) - A Joint Learning Approach based on Self-Distillation for Keyphrase
Extraction from Scientific Documents [29.479331909227998]
Keyphrase extraction is the task of extracting a small set of phrases that best describe a document.
Most existing benchmark datasets for the task typically have limited numbers of annotated documents.
We propose a simple and efficient joint learning approach based on the idea of self-distillation.
arXiv Detail & Related papers (2020-10-22T18:36:31Z) - Low-Resource Domain Adaptation for Compositional Task-Oriented Semantic
Parsing [85.35582118010608]
Task-oriented semantic parsing is a critical component of virtual assistants.
Recent advances in deep learning have enabled several approaches to successfully parse more complex queries.
We propose a novel method that outperforms a supervised neural model at a 10-fold data reduction.
arXiv Detail & Related papers (2020-10-07T17:47:53Z) - PerKey: A Persian News Corpus for Keyphrase Extraction and Generation [1.192436948211501]
PerKey is a corpus of 553k news articles from six Persian news websites and agencies with relatively high quality author extracted keyphrases.
The data was put into human assessment to ensure the quality of the keyphrases.
arXiv Detail & Related papers (2020-09-25T14:36:41Z) - Keyphrase Extraction with Span-based Feature Representations [13.790461555410747]
Keyphrases are capable of providing semantic metadata characterizing documents.
Three approaches to address keyphrase extraction: (i) traditional two-step ranking method, (ii) sequence labeling and (iii) generation using neural networks.
In this paper, we propose a novelty Span Keyphrase Extraction model that extracts span-based feature representation of keyphrase directly from all the content tokens.
arXiv Detail & Related papers (2020-02-13T09:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.