Unsupervised Key-phrase Extraction and Clustering for Classification
Scheme in Scientific Publications
- URL: http://arxiv.org/abs/2101.09990v2
- Date: Mon, 8 Feb 2021 20:31:42 GMT
- Title: Unsupervised Key-phrase Extraction and Clustering for Classification
Scheme in Scientific Publications
- Authors: Xiajing Li, Marios Daoutis
- Abstract summary: We investigate possible ways of automating parts of the Systematic Mapping (SM) and Systematic Review (SR) process.
Key-phrases are extracted from scientific documents using unsupervised methods, which are then used to construct the corresponding Classification Scheme.
We also explore how clustering can be used to group related key-phrases.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Several methods have been explored for automating parts of Systematic Mapping
(SM) and Systematic Review (SR) methodologies. Challenges typically evolve
around the gaps in semantic understanding of text, as well as lack of domain
and background knowledge necessary to bridge that gap. In this paper we
investigate possible ways of automating parts of the SM/SR process, i.e. that
of extracting keywords and key-phrases from scientific documents using
unsupervised methods, which are then used as a basis to construct the
corresponding Classification Scheme using semantic key-phrase clustering
techniques. Specifically, we explore the effect of ensemble scores measure in
key-phrase extraction, we explore semantic network based word embedding in
embedding representation of phrase semantics and finally we also explore how
clustering can be used to group related key-phrases. The evaluation is
conducted on a dataset of publications pertaining the domain of "Explainable
AI" which we constructed using standard publicly available digital libraries
and sets of indexing terms (keywords). Results shows that: ensemble ranking
score does improve the key-phrase extraction performance. Semantic-network
based word embedding based on the ConceptNet Semantic Network has similar
performance with contextualized word embedding, however the former are
computationally more efficient. Finally Semantic key-phrase clustering at
term-level can group similar terms together that can be suitable for
classification scheme.
Related papers
- Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs)
Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy.
At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z) - Open-Vocabulary Segmentation with Semantic-Assisted Calibration [73.39366775301382]
We study open-vocabulary segmentation (OVS) through calibrating in-vocabulary and domain-biased embedding space with contextual prior of CLIP.
We present a Semantic-assisted CAlibration Network (SCAN) to achieve state-of-the-art performance on open-vocabulary segmentation benchmarks.
arXiv Detail & Related papers (2023-12-07T07:00:09Z) - Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic
Segmentation [59.37587762543934]
This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS)
Existing methods suffer from a granularity inconsistency regarding the usage of group tokens.
We propose the prototypical guidance network (PGSeg) that incorporates multi-modal regularization.
arXiv Detail & Related papers (2023-10-29T13:18:00Z) - A Process for Topic Modelling Via Word Embeddings [0.0]
This work combines algorithms based on word embeddings, dimensionality reduction, and clustering.
The objective is to obtain topics from a set of unclassified texts.
arXiv Detail & Related papers (2023-10-06T15:10:35Z) - CLIP-GCD: Simple Language Guided Generalized Category Discovery [21.778676607030253]
Generalized Category Discovery (GCD) requires a model to both classify known categories and cluster unknown categories in unlabeled data.
Prior methods leveraged self-supervised pre-training combined with supervised fine-tuning on the labeled data, followed by simple clustering methods.
We propose to leverage multi-modal (vision and language) models, in two complementary ways.
arXiv Detail & Related papers (2023-05-17T17:55:33Z) - Representation Learning for Resource-Constrained Keyphrase Generation [78.02577815973764]
We introduce salient span recovery and salient span prediction as guided denoising language modeling objectives.
We show the effectiveness of the proposed approach for low-resource and zero-shot keyphrase generation.
arXiv Detail & Related papers (2022-03-15T17:48:04Z) - Deep Keyphrase Completion [59.0413813332449]
Keyphrase provides accurate information of document content that is highly compact, concise, full of meanings, and widely used for discourse comprehension, organization, and text retrieval.
We propose textitkeyphrase completion (KPC) to generate more keyphrases for document (e.g. scientific publication) taking advantage of document content along with a very limited number of known keyphrases.
We name it textitdeep keyphrase completion (DKPC) since it attempts to capture the deep semantic meaning of the document content together with known keyphrases via a deep learning framework
arXiv Detail & Related papers (2021-10-29T07:15:35Z) - UniKeyphrase: A Unified Extraction and Generation Framework for
Keyphrase Prediction [20.26899340581431]
Keyphrase Prediction task aims at predicting several keyphrases that can summarize the main idea of the given document.
Mainstream KP methods can be categorized into purely generative approaches and integrated models with extraction and generation.
We propose UniKeyphrase, a novel end-to-end learning framework that jointly learns to extract and generate keyphrases.
arXiv Detail & Related papers (2021-06-09T07:09:51Z) - Keyphrase Extraction with Dynamic Graph Convolutional Networks and
Diversified Inference [50.768682650658384]
Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document.
Recent Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks.
In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously.
arXiv Detail & Related papers (2020-10-24T08:11:23Z) - WSRNet: Joint Spotting and Recognition of Handwritten Words [38.212002652391]
The proposed network is comprised of a non-recurrent CTC branch and a Seq2Seq branch that is further augmented with an Autoencoding module.
We show how to further process these representations with binarization and a retraining scheme to provide compact and highly efficient descriptors.
arXiv Detail & Related papers (2020-08-17T06:22:05Z) - Keyphrase Extraction with Span-based Feature Representations [13.790461555410747]
Keyphrases are capable of providing semantic metadata characterizing documents.
Three approaches to address keyphrase extraction: (i) traditional two-step ranking method, (ii) sequence labeling and (iii) generation using neural networks.
In this paper, we propose a novelty Span Keyphrase Extraction model that extracts span-based feature representation of keyphrase directly from all the content tokens.
arXiv Detail & Related papers (2020-02-13T09:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.