Research Experiment on Multi-Model Comparison for Chinese Text Classification Tasks
- URL: http://arxiv.org/abs/2412.18908v1
- Date: Wed, 25 Dec 2024 13:54:40 GMT
- Title: Research Experiment on Multi-Model Comparison for Chinese Text Classification Tasks
- Authors: JiaCheng Li,
- Abstract summary: This paper conducts a comparative study on three deep learning models:TextCNN, TextRNN, and FastText.specifically for Chinese text classification tasks.
The performance of these models is evaluated, and their applicability in different scenarios is discussed.
- Score: 12.087640144194246
- License:
- Abstract: With the explosive growth of Chinese text data and advancements in natural language processing technologies, Chinese text classification has become one of the key techniques in fields such as information retrieval and sentiment analysis, attracting increasing attention. This paper conducts a comparative study on three deep learning models:TextCNN, TextRNN, and FastText.specifically for Chinese text classification tasks. By conducting experiments on the THUCNews dataset, the performance of these models is evaluated, and their applicability in different scenarios is discussed.
Related papers
- The Text Classification Pipeline: Starting Shallow going Deeper [4.97309503788908]
Text Classification (TC) stands as a cornerstone within the realm of Natural Language Processing (NLP)
This monograph provides an in-depth exploration of the TC pipeline, with a particular emphasis on evaluating the impact of each component on the overall performance of TC models.
The pipeline includes state-of-the-art datasets, text preprocessing techniques, text representation methods, classification models, evaluation metrics, current results and future trends.
arXiv Detail & Related papers (2024-12-30T23:01:19Z) - Putting Context in Context: the Impact of Discussion Structure on Text
Classification [13.15873889847739]
We propose a series of experiments on a large dataset for stance detection in English.
We evaluate the contribution of different types of contextual information.
We show that structural information can be highly beneficial to text classification but only under certain circumstances.
arXiv Detail & Related papers (2024-02-05T12:56:22Z) - Comparative Analysis of Multilingual Text Classification &
Identification through Deep Learning and Embedding Visualization [0.0]
The study employs LangDetect, LangId, FastText, and Sentence Transformer on a dataset encompassing 17 languages.
The FastText multi-layer perceptron model achieved remarkable accuracy, precision, recall, and F1 score, outperforming the Sentence Transformer model.
arXiv Detail & Related papers (2023-12-06T12:03:27Z) - Beyond Turing: A Comparative Analysis of Approaches for Detecting Machine-Generated Text [1.919654267936118]
Traditional shallow learning, Language Model (LM) fine-tuning, and Multilingual Model fine-tuning are evaluated.
Results reveal considerable differences in performance across methods.
This study paves the way for future research aimed at creating robust and highly discriminative models.
arXiv Detail & Related papers (2023-11-21T06:23:38Z) - Orientation-Independent Chinese Text Recognition in Scene Images [61.34060587461462]
We take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images.
Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information.
arXiv Detail & Related papers (2023-09-03T05:30:21Z) - T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - Multi-Dimensional Evaluation of Text Summarization with In-Context
Learning [79.02280189976562]
In this paper, we study the efficacy of large language models as multi-dimensional evaluators using in-context learning.
Our experiments show that in-context learning-based evaluators are competitive with learned evaluation frameworks for the task of text summarization.
We then analyze the effects of factors such as the selection and number of in-context examples on performance.
arXiv Detail & Related papers (2023-06-01T23:27:49Z) - OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models [122.27878464009181]
We conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks.
OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available.
arXiv Detail & Related papers (2023-05-13T11:28:37Z) - Detecting Text Formality: A Study of Text Classification Approaches [78.11745751651708]
This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods.
We conducted three types of experiments -- monolingual, multilingual, and cross-lingual.
The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task.
arXiv Detail & Related papers (2022-04-19T16:23:07Z) - Benchmarking Chinese Text Recognition: Datasets, Baselines, and an
Empirical Study [25.609450020149637]
Existing text recognition methods are mainly for English texts, whereas ignoring the pivotal role of Chinese texts.
We manually collect Chinese text datasets from publicly available competitions, projects, and papers, then divide them into four categories including scene, web, document, and handwriting datasets.
By analyzing the experimental results, we surprisingly observe that state-of-the-art baselines for recognizing English texts cannot perform well on Chinese scenarios.
arXiv Detail & Related papers (2021-12-30T15:30:52Z) - Deep Learning Based Text Classification: A Comprehensive Review [75.8403533775179]
We provide a review of more than 150 deep learning based models for text classification developed in recent years.
We also provide a summary of more than 40 popular datasets widely used for text classification.
arXiv Detail & Related papers (2020-04-06T02:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.