Beyond original Research Articles Categorization via NLP
- URL: http://arxiv.org/abs/2309.07020v1
- Date: Wed, 13 Sep 2023 15:23:30 GMT
- Title: Beyond original Research Articles Categorization via NLP
- Authors: Rosanna Turrisi
- Abstract summary: The study leverages the power of pre-trained language models, specifically SciBERT, to extract meaningful representations of abstracts from the ArXiv dataset.
The results demonstrate that the proposed approach captures subject information more effectively than the traditional arXiv labeling system.
- Score: 2.28438857884398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work proposes a novel approach to text categorization -- for unknown
categories -- in the context of scientific literature, using Natural Language
Processing techniques. The study leverages the power of pre-trained language
models, specifically SciBERT, to extract meaningful representations of
abstracts from the ArXiv dataset. Text categorization is performed using the
K-Means algorithm, and the optimal number of clusters is determined based on
the Silhouette score. The results demonstrate that the proposed approach
captures subject information more effectively than the traditional arXiv
labeling system, leading to improved text categorization. The approach offers
potential for better navigation and recommendation systems in the rapidly
growing landscape of scientific research literature.
Related papers
- Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - Text classification optimization algorithm based on graph neural network [0.36651088217486427]
This paper introduces a text classification optimization algorithm utilizing graph neural networks.
By introducing adaptive graph construction strategy and efficient graph convolution operation, the accuracy and efficiency of text classification are effectively improved.
arXiv Detail & Related papers (2024-08-09T23:25:37Z) - Enriched BERT Embeddings for Scholarly Publication Classification [0.13654846342364302]
The NSLP 2024 FoRC Task I addresses this challenge organized as a competition.
The goal is to develop a classifier capable of predicting one of 123 predefined classes from the Open Research Knowledge Graph (ORKG) taxonomy of research fields for a given article.
arXiv Detail & Related papers (2024-05-07T09:05:20Z) - FecTek: Enhancing Term Weight in Lexicon-Based Retrieval with Feature Context and Term-level Knowledge [54.61068946420894]
We introduce an innovative method by introducing FEature Context and TErm-level Knowledge modules.
To effectively enrich the feature context representations of term weight, the Feature Context Module (FCM) is introduced.
We also develop a term-level knowledge guidance module (TKGM) for effectively utilizing term-level knowledge to intelligently guide the modeling process of term weight.
arXiv Detail & Related papers (2024-04-18T12:58:36Z) - Empowering Interdisciplinary Research with BERT-Based Models: An Approach Through SciBERT-CNN with Topic Modeling [0.0]
This paper introduces a novel approach using the SciBERT model and CNNs to systematically categorize academic abstracts.
The CNN uses convolution and pooling to enhance feature extraction and reduce dimensionality.
arXiv Detail & Related papers (2024-04-16T05:21:47Z) - Text Clustering with LLM Embeddings [0.0]
The effectiveness of text clustering largely depends on the selection of textual embeddings and clustering algorithms.
Recent advancements in large language models (LLMs) have the potential to enhance this task.
Findings indicate that LLM embeddings are superior at capturing subtleties in structured language.
arXiv Detail & Related papers (2024-03-22T11:08:48Z) - Empirical and Experimental Perspectives on Big Data in Recommendation
Systems: A Comprehensive Survey [2.6319554262325924]
This survey paper provides a comprehensive analysis of big data algorithms in recommendation systems.
It proposes a two-pronged approach: a thorough analysis of current algorithms and a novel, hierarchical taxonomy for precise categorization.
arXiv Detail & Related papers (2024-02-01T23:51:29Z) - Be More with Less: Hypergraph Attention Networks for Inductive Text
Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task.
Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words.
We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z) - A Survey of Embedding Space Alignment Methods for Language and Knowledge
Graphs [77.34726150561087]
We survey the current research landscape on word, sentence and knowledge graph embedding algorithms.
We provide a classification of the relevant alignment techniques and discuss benchmark datasets used in this field of research.
arXiv Detail & Related papers (2020-10-26T16:08:13Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z) - Deep Learning Based Text Classification: A Comprehensive Review [75.8403533775179]
We provide a review of more than 150 deep learning based models for text classification developed in recent years.
We also provide a summary of more than 40 popular datasets widely used for text classification.
arXiv Detail & Related papers (2020-04-06T02:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.