Beyond original Research Articles Categorization via NLP
- URL: http://arxiv.org/abs/2309.07020v1
- Date: Wed, 13 Sep 2023 15:23:30 GMT
- Title: Beyond original Research Articles Categorization via NLP
- Authors: Rosanna Turrisi
- Abstract summary: The study leverages the power of pre-trained language models, specifically SciBERT, to extract meaningful representations of abstracts from the ArXiv dataset.
The results demonstrate that the proposed approach captures subject information more effectively than the traditional arXiv labeling system.
- Score: 2.28438857884398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work proposes a novel approach to text categorization -- for unknown
categories -- in the context of scientific literature, using Natural Language
Processing techniques. The study leverages the power of pre-trained language
models, specifically SciBERT, to extract meaningful representations of
abstracts from the ArXiv dataset. Text categorization is performed using the
K-Means algorithm, and the optimal number of clusters is determined based on
the Silhouette score. The results demonstrate that the proposed approach
captures subject information more effectively than the traditional arXiv
labeling system, leading to improved text categorization. The approach offers
potential for better navigation and recommendation systems in the rapidly
growing landscape of scientific research literature.
Related papers
- Enriched BERT Embeddings for Scholarly Publication Classification [0.13654846342364302]
The NSLP 2024 FoRC Task I addresses this challenge organized as a competition.
The goal is to develop a classifier capable of predicting one of 123 predefined classes from the Open Research Knowledge Graph (ORKG) taxonomy of research fields for a given article.
arXiv Detail & Related papers (2024-05-07T09:05:20Z) - FecTek: Enhancing Term Weight in Lexicon-Based Retrieval with Feature Context and Term-level Knowledge [54.61068946420894]
We introduce an innovative method by introducing FEature Context and TErm-level Knowledge modules.
To effectively enrich the feature context representations of term weight, the Feature Context Module (FCM) is introduced.
We also develop a term-level knowledge guidance module (TKGM) for effectively utilizing term-level knowledge to intelligently guide the modeling process of term weight.
arXiv Detail & Related papers (2024-04-18T12:58:36Z) - Empowering Interdisciplinary Research with BERT-Based Models: An Approach Through SciBERT-CNN with Topic Modeling [0.0]
This paper introduces a novel approach using the SciBERT model and CNNs to systematically categorize academic abstracts.
The CNN uses convolution and pooling to enhance feature extraction and reduce dimensionality.
arXiv Detail & Related papers (2024-04-16T05:21:47Z) - Text clustering with LLM embeddings [0.0]
We investigate how different textual embeddings and clustering algorithms affect how text datasets are clustered.
Findings reveal that LLM embeddings excel at capturing subtleties in structured language, while BERT leads the lightweight options in performance.
arXiv Detail & Related papers (2024-03-22T11:08:48Z) - Empirical and Experimental Perspectives on Big Data in Recommendation
Systems: A Comprehensive Survey [2.6319554262325924]
This survey paper provides a comprehensive analysis of big data algorithms in recommendation systems.
It proposes a two-pronged approach: a thorough analysis of current algorithms and a novel, hierarchical taxonomy for precise categorization.
arXiv Detail & Related papers (2024-02-01T23:51:29Z) - A Novel Ehanced Move Recognition Algorithm Based on Pre-trained Models
with Positional Embeddings [6.688643243555054]
The recognition of abstracts is crucial for effectively locating the content and clarifying the article.
This paper proposes a novel enhanced move recognition algorithm with an improved pre-trained model and a gated network with attention mechanism for unstructured abstracts of Chinese scientific and technological papers.
arXiv Detail & Related papers (2023-08-14T03:20:28Z) - Be More with Less: Hypergraph Attention Networks for Inductive Text
Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task.
Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words.
We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z) - A Survey of Embedding Space Alignment Methods for Language and Knowledge
Graphs [77.34726150561087]
We survey the current research landscape on word, sentence and knowledge graph embedding algorithms.
We provide a classification of the relevant alignment techniques and discuss benchmark datasets used in this field of research.
arXiv Detail & Related papers (2020-10-26T16:08:13Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z) - Deep Learning feature selection to unhide demographic recommender
systems factors [63.732639864601914]
The matrix factorization model generates factors which do not incorporate semantic knowledge.
DeepUnHide is able to extract demographic information from the users and items factors in collaborative filtering recommender systems.
arXiv Detail & Related papers (2020-06-17T17:36:48Z) - Deep Learning Based Text Classification: A Comprehensive Review [75.8403533775179]
We provide a review of more than 150 deep learning based models for text classification developed in recent years.
We also provide a summary of more than 40 popular datasets widely used for text classification.
arXiv Detail & Related papers (2020-04-06T02:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.