Text Classification: A Review, Empirical, and Experimental Evaluation
- URL: http://arxiv.org/abs/2401.12982v1
- Date: Thu, 11 Jan 2024 08:17:42 GMT
- Title: Text Classification: A Review, Empirical, and Experimental Evaluation
- Authors: Kamal Taha, Paul D. Yoo, Chan Yeun, Aya Taha
- Abstract summary: Existing survey papers categorize algorithms for text classification into broad classes.
We introduce a novel methodological taxonomy that classifies algorithms hierarchically into fine-grained classes and specific techniques.
Our study is the first survey to utilize this methodological taxonomy for classifying algorithms for text classification.
- Score: 2.341806147715478
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The explosive and widespread growth of data necessitates the use of text
classification to extract crucial information from vast amounts of data.
Consequently, there has been a surge of research in both classical and deep
learning text classification methods. Despite the numerous methods proposed in
the literature, there is still a pressing need for a comprehensive and
up-to-date survey. Existing survey papers categorize algorithms for text
classification into broad classes, which can lead to the misclassification of
unrelated algorithms and incorrect assessments of their qualities and behaviors
using the same metrics. To address these limitations, our paper introduces a
novel methodological taxonomy that classifies algorithms hierarchically into
fine-grained classes and specific techniques. The taxonomy includes methodology
categories, methodology techniques, and methodology sub-techniques. Our study
is the first survey to utilize this methodological taxonomy for classifying
algorithms for text classification. Furthermore, our study also conducts
empirical evaluation and experimental comparisons and rankings of different
algorithms that employ the same specific sub-technique, different
sub-techniques within the same technique, different techniques within the same
category, and categories
Related papers
- An Instance-based Plus Ensemble Learning Method for Classification of Scientific Papers [2.0794749869068005]
This paper introduces a novel approach that combines instance-based learning and ensemble learning techniques for classifying scientific papers.
Experiments show that the proposed classification method is effective and efficient in categorizing papers into various research areas.
arXiv Detail & Related papers (2024-09-21T19:42:15Z) - Empirical and Experimental Insights into Data Mining Techniques for
Crime Prediction: A Comprehensive Survey [0.8702432681310399]
The paper covers the statistical methods, machine learning algorithms, and deep learning techniques employed to analyze crime data.
We propose a methodological taxonomy that classifies crime prediction algorithms into specific techniques.
arXiv Detail & Related papers (2024-02-17T15:00:45Z) - Empirical and Experimental Perspectives on Big Data in Recommendation
Systems: A Comprehensive Survey [2.6319554262325924]
This survey paper provides a comprehensive analysis of big data algorithms in recommendation systems.
It proposes a two-pronged approach: a thorough analysis of current algorithms and a novel, hierarchical taxonomy for precise categorization.
arXiv Detail & Related papers (2024-02-01T23:51:29Z) - Incremental hierarchical text clustering methods: a review [49.32130498861987]
This study aims to analyze various hierarchical and incremental clustering techniques.
The main contribution of this research is the organization and comparison of the techniques used by studies published between 2010 and 2018 that aimed to texts documents clustering.
arXiv Detail & Related papers (2023-12-12T22:27:29Z) - Text Classification: A Perspective of Deep Learning Methods [0.0679877553227375]
This paper introduces deep learning-based text classification algorithms, including important steps required for text classification tasks.
At the end of the article, different deep learning text classification methods are compared and summarized.
arXiv Detail & Related papers (2023-09-24T21:49:51Z) - TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - A Survey of Embedding Space Alignment Methods for Language and Knowledge
Graphs [77.34726150561087]
We survey the current research landscape on word, sentence and knowledge graph embedding algorithms.
We provide a classification of the relevant alignment techniques and discuss benchmark datasets used in this field of research.
arXiv Detail & Related papers (2020-10-26T16:08:13Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z) - Deep Learning Based Text Classification: A Comprehensive Review [75.8403533775179]
We provide a review of more than 150 deep learning based models for text classification developed in recent years.
We also provide a summary of more than 40 popular datasets widely used for text classification.
arXiv Detail & Related papers (2020-04-06T02:00:30Z) - Imbalanced classification: a paradigm-based review [21.578692329486643]
Multiple resampling techniques have been proposed to address the class imbalance issues.
There is no general guidance on when to use each technique.
We provide a paradigm-based review of the common resampling techniques for binary classification under imbalanced class sizes.
arXiv Detail & Related papers (2020-02-11T18:34:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.