Keyphrase Generation for Scientific Document Retrieval
- URL: http://arxiv.org/abs/2106.14726v1
- Date: Mon, 28 Jun 2021 13:55:49 GMT
- Title: Keyphrase Generation for Scientific Document Retrieval
- Authors: Florian Boudin, Ygor Gallina, Akiko Aizawa
- Abstract summary: This study provides empirical evidence that Sequence-to-sequence models can significantly improve document retrieval performance.
We introduce a new extrinsic evaluation framework that allows for a better understanding of the limitations of keyphrase generation models.
- Score: 28.22174864849121
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequence-to-sequence models have lead to significant progress in keyphrase
generation, but it remains unknown whether they are reliable enough to be
beneficial for document retrieval. This study provides empirical evidence that
such models can significantly improve retrieval performance, and introduces a
new extrinsic evaluation framework that allows for a better understanding of
the limitations of keyphrase generation models. Using this framework, we point
out and discuss the difficulties encountered with supplementing documents with
-- not present in text -- keyphrases, and generalizing models across domains.
Our code is available at https://github.com/boudinfl/ir-using-kg
Related papers
- Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Neural Keyphrase Generation: Analysis and Evaluation [47.004575377472285]
We study various tendencies exhibited by three strong models: T5 (based on a pre-trained transformer), CatSeq-Transformer (a non-pretrained Transformer), and ExHiRD (based on a recurrent neural network)
We propose a novel metric framework, SoftKeyScore, to evaluate the similarity between two sets of keyphrases.
arXiv Detail & Related papers (2023-04-27T00:10:21Z) - Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models.
Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Deep Keyphrase Completion [59.0413813332449]
Keyphrase provides accurate information of document content that is highly compact, concise, full of meanings, and widely used for discourse comprehension, organization, and text retrieval.
We propose textitkeyphrase completion (KPC) to generate more keyphrases for document (e.g. scientific publication) taking advantage of document content along with a very limited number of known keyphrases.
We name it textitdeep keyphrase completion (DKPC) since it attempts to capture the deep semantic meaning of the document content together with known keyphrases via a deep learning framework
arXiv Detail & Related papers (2021-10-29T07:15:35Z) - Towards Document-Level Paraphrase Generation with Sentence Rewriting and
Reordering [88.08581016329398]
We propose CoRPG (Coherence Relationship guided Paraphrase Generation) for document-level paraphrase generation.
We use graph GRU to encode the coherence relationship graph and get the coherence-aware representation for each sentence.
Our model can generate document paraphrase with more diversity and semantic preservation.
arXiv Detail & Related papers (2021-09-15T05:53:40Z) - Heterogeneous Graph Neural Networks for Keyphrase Generation [13.841525616800908]
We propose a novel graph-based method that can capture explicit knowledge from related references.
Our model first retrieves some document-keyphrases pairs similar to the source document from a pre-defined index as references.
To guide the decoding process, a hierarchical attention and copy mechanism is introduced, which directly copies appropriate words from both the source document and its references.
arXiv Detail & Related papers (2021-09-10T07:17:07Z) - Unsupervised Deep Keyphrase Generation [14.544869226959612]
Keyphrase generation aims to summarize long documents with a collection of salient phrases.
Deep neural models have demonstrated a remarkable success in this task, capable of predicting keyphrases that are even absent from a document.
We present a novel method for keyphrase generation, AutoKeyGen, without the supervision of any human annotation.
arXiv Detail & Related papers (2021-04-18T05:53:19Z) - Keyphrase Extraction with Span-based Feature Representations [13.790461555410747]
Keyphrases are capable of providing semantic metadata characterizing documents.
Three approaches to address keyphrase extraction: (i) traditional two-step ranking method, (ii) sequence labeling and (iii) generation using neural networks.
In this paper, we propose a novelty Span Keyphrase Extraction model that extracts span-based feature representation of keyphrase directly from all the content tokens.
arXiv Detail & Related papers (2020-02-13T09:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.