Related papers: Multi-Document Keyphrase Extraction: A Literature Review and the First Dataset

Multi-Document Keyphrase Extraction: A Literature Review and the First Dataset

URL: http://arxiv.org/abs/2110.01073v1
Date: Sun, 3 Oct 2021 19:10:28 GMT
Title: Multi-Document Keyphrase Extraction: A Literature Review and the First Dataset
Authors: Ori Shapira, Ramakanth Pasunuru, Ido Dagan, Yael Amsterdamer
Abstract summary: Multi-document keyphrase extraction has been infrequently studied, despite its utility for describing sets of documents. We present here the first literature review and the first dataset for the task, MK-DUC-01, which can serve as a new benchmark.
Score: 24.91326715164367
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Keyphrase extraction has been comprehensively researched within the single-document setting, with an abundance of methods and a wealth of datasets. In contrast, multi-document keyphrase extraction has been infrequently studied, despite its utility for describing sets of documents, and its use in summarization. Moreover, no dataset existed for multi-document keyphrase extraction, hindering the progress of the task. Recent advances in multi-text processing make the task an even more appealing challenge to pursue. To initiate this pursuit, we present here the first literature review and the first dataset for the task, MK-DUC-01, which can serve as a new benchmark. We test several keyphrase extraction baselines on our data and show their results.

Related papers

Zero-Shot Keyphrase Generation: Investigating Specialized Instructions and Multi-Sample Aggregation on Large Language Models [52.829293635314194]
Keyphrase generation is a long-standing NLP task for automatically generating keyphrases for a given document. We focus on the zero-shot capabilities of open-source instruction-tuned LLMs (Phi-3, Llama-3) and the closed-source GPT-4o for this task.
arXiv Detail & Related papers (2025-03-01T19:38:57Z)
LongKey: Keyphrase Extraction for Long Documents [3.832358080820378]
LongKey is a novel framework for extracting keyphrases from lengthy documents. LongKey consistently outperforms existing unsupervised and language model-based keyphrase extraction methods.
arXiv Detail & Related papers (2024-11-26T20:26:47Z)
BibRank: Automatic Keyphrase Extraction Platform Using~Metadata [0.0]
This paper introduces a platform that integrates keyphrase datasets and facilitates the evaluation of keyphrase extraction algorithms. The platform includes BibRank, an automatic keyphrase extraction algorithm that leverages a rich dataset obtained by parsing word in Bib format.
arXiv Detail & Related papers (2023-10-13T14:44:34Z)
Improving Keyphrase Extraction with Data Augmentation and Information Filtering [67.43025048639333]
Keyphrase extraction is one of the essential tasks for document understanding in NLP. We present a novel corpus and method for keyphrase extraction from the videos streamed on the Behance platform.
arXiv Detail & Related papers (2022-09-11T22:38:02Z)
TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents. Text reading and information extraction can reinforce each other via a well-designed multi-modal context block. The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z)
LDKP: A Dataset for Identifying Keyphrases from Long Scientific Documents [48.84086818702328]
Identifying keyphrases (KPs) from text documents is a fundamental task in natural language processing and information retrieval. Vast majority of the benchmark datasets for this task are from the scientific domain containing only the document title and abstract information. This presents three challenges for real-world applications: human-written summaries are unavailable for most documents, the documents are almost always long, and a high percentage of KPs are directly found beyond the limited context of title and abstract.
arXiv Detail & Related papers (2022-03-29T08:44:57Z)
Deep Keyphrase Completion [59.0413813332449]
Keyphrase provides accurate information of document content that is highly compact, concise, full of meanings, and widely used for discourse comprehension, organization, and text retrieval. We propose textitkeyphrase completion (KPC) to generate more keyphrases for document (e.g. scientific publication) taking advantage of document content along with a very limited number of known keyphrases. We name it textitdeep keyphrase completion (DKPC) since it attempts to capture the deep semantic meaning of the document content together with known keyphrases via a deep learning framework
arXiv Detail & Related papers (2021-10-29T07:15:35Z)
A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents [29.479331909227998]
Keyphrase extraction is the task of extracting a small set of phrases that best describe a document. Most existing benchmark datasets for the task typically have limited numbers of annotated documents. We propose a simple and efficient joint learning approach based on the idea of self-distillation.
arXiv Detail & Related papers (2020-10-22T18:36:31Z)
TRIE: End-to-End Text Reading and Information Extraction for Document Understanding [56.1416883796342]
We propose a unified end-to-end text reading and information extraction network. multimodal visual and textual features of text reading are fused for information extraction. Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
arXiv Detail & Related papers (2020-05-27T01:47:26Z)
GLEAKE: Global and Local Embedding Automatic Keyphrase Extraction [1.0681288493631977]
We introduce Global and Local Embedding Automatic Keyphrase Extractor (GLEAKE) for the task of automatic keyphrase extraction. GLEAKE uses single and multi-word embedding techniques to explore the syntactic and semantic aspects of the candidate phrases. It refines the most significant phrases as a final set of keyphrases.
arXiv Detail & Related papers (2020-05-19T20:24:02Z)
From Standard Summarization to New Tasks and Beyond: Summarization with Manifold Information [77.89755281215079]
Text summarization is the research area aiming at creating a short and condensed version of the original document. In real-world applications, most of the data is not in a plain text format. This paper focuses on the survey of these new summarization tasks and approaches in the real-world application.
arXiv Detail & Related papers (2020-05-10T14:59:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.