Keyphrase Generation Beyond the Boundaries of Title and Abstract
- URL: http://arxiv.org/abs/2112.06776v1
- Date: Mon, 13 Dec 2021 16:33:01 GMT
- Title: Keyphrase Generation Beyond the Boundaries of Title and Abstract
- Authors: Krishna Garg, Jishnu Ray Chowdhury, Cornelia Caragea
- Abstract summary: Keyphrase generation aims at generating phrases (keyphrases) that best describe a given document.
In this work, we explore whether the integration of additional data from semantically similar articles or from the full text of the given article can be helpful for a neural keyphrase generation model.
We discover that adding sentences from the full text particularly in the form of summary of the article can significantly improve the generation of both types of keyphrases.
- Score: 28.56508031460787
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Keyphrase generation aims at generating phrases (keyphrases) that best
describe a given document. In scholarly domains, current approaches to this
task are neural approaches and have largely worked with only the title and
abstract of the articles. In this work, we explore whether the integration of
additional data from semantically similar articles or from the full text of the
given article can be helpful for a neural keyphrase generation model. We
discover that adding sentences from the full text particularly in the form of
summary of the article can significantly improve the generation of both types
of keyphrases that are either present or absent from the title and abstract.
The experimental results on the three acclaimed models along with one of the
latest transformer models suitable for longer documents, Longformer
Encoder-Decoder (LED) validate the observation. We also present a new
large-scale scholarly dataset FullTextKP for keyphrase generation, which we use
for our experiments. Unlike prior large-scale datasets, FullTextKP includes the
full text of the articles alongside title and abstract. We will release the
source code to stimulate research on the proposed ideas.
Related papers
- Cross-Domain Robustness of Transformer-based Keyphrase Generation [1.8492669447784602]
A list of keyphrases is an important element of a text in databases and repositories of electronic documents.
In our experiments, abstractive text summarization models fine-tuned for keyphrase generation show quite high results for a target text corpus.
We present an evaluation of the fine-tuned BART models for the keyphrase selection task across six benchmark corpora.
arXiv Detail & Related papers (2023-12-17T12:27:15Z) - Data Augmentation for Low-Resource Keyphrase Generation [46.52115499306222]
Keyphrase generation is the task of summarizing the contents of any given article into a few salient phrases (or keyphrases)
Existing works for the task mostly rely on large-scale annotated datasets, which are not easy to acquire.
We present data augmentation strategies specifically to address keyphrase generation in purely resource-constrained domains.
arXiv Detail & Related papers (2023-05-29T09:20:34Z) - Improving Keyphrase Extraction with Data Augmentation and Information
Filtering [67.43025048639333]
Keyphrase extraction is one of the essential tasks for document understanding in NLP.
We present a novel corpus and method for keyphrase extraction from the videos streamed on the Behance platform.
arXiv Detail & Related papers (2022-09-11T22:38:02Z) - LDKP: A Dataset for Identifying Keyphrases from Long Scientific
Documents [48.84086818702328]
Identifying keyphrases (KPs) from text documents is a fundamental task in natural language processing and information retrieval.
Vast majority of the benchmark datasets for this task are from the scientific domain containing only the document title and abstract information.
This presents three challenges for real-world applications: human-written summaries are unavailable for most documents, the documents are almost always long, and a high percentage of KPs are directly found beyond the limited context of title and abstract.
arXiv Detail & Related papers (2022-03-29T08:44:57Z) - Deep Keyphrase Completion [59.0413813332449]
Keyphrase provides accurate information of document content that is highly compact, concise, full of meanings, and widely used for discourse comprehension, organization, and text retrieval.
We propose textitkeyphrase completion (KPC) to generate more keyphrases for document (e.g. scientific publication) taking advantage of document content along with a very limited number of known keyphrases.
We name it textitdeep keyphrase completion (DKPC) since it attempts to capture the deep semantic meaning of the document content together with known keyphrases via a deep learning framework
arXiv Detail & Related papers (2021-10-29T07:15:35Z) - A Joint Learning Approach based on Self-Distillation for Keyphrase
Extraction from Scientific Documents [29.479331909227998]
Keyphrase extraction is the task of extracting a small set of phrases that best describe a document.
Most existing benchmark datasets for the task typically have limited numbers of annotated documents.
We propose a simple and efficient joint learning approach based on the idea of self-distillation.
arXiv Detail & Related papers (2020-10-22T18:36:31Z) - Select, Extract and Generate: Neural Keyphrase Generation with
Layer-wise Coverage Attention [75.44523978180317]
We propose emphSEG-Net, a neural keyphrase generation model that is composed of two major components.
The experimental results on seven keyphrase generation benchmarks from scientific and web documents demonstrate that SEG-Net outperforms the state-of-the-art neural generative methods by a large margin.
arXiv Detail & Related papers (2020-08-04T18:00:07Z) - Keyphrase Generation with Cross-Document Attention [28.565813544820553]
Keyphrase generation aims to produce a set of phrases summarizing the essentials of a given document.
We propose CDKGen, a Transformer-based keyphrase generator, which expands the Transformer to global attention.
We also adopt a copy mechanism to enhance our model via selecting appropriate words from documents to deal with out-of-vocabulary words in keyphrases.
arXiv Detail & Related papers (2020-04-21T07:58:27Z) - Diverse, Controllable, and Keyphrase-Aware: A Corpus and Method for News
Multi-Headline Generation [98.98411895250774]
We propose generating multiple headlines with keyphrases of user interests.
The proposed method achieves state-of-the-art results in terms of quality and diversity.
arXiv Detail & Related papers (2020-04-08T08:30:05Z) - Keyphrase Extraction with Span-based Feature Representations [13.790461555410747]
Keyphrases are capable of providing semantic metadata characterizing documents.
Three approaches to address keyphrase extraction: (i) traditional two-step ranking method, (ii) sequence labeling and (iii) generation using neural networks.
In this paper, we propose a novelty Span Keyphrase Extraction model that extracts span-based feature representation of keyphrase directly from all the content tokens.
arXiv Detail & Related papers (2020-02-13T09:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.