Khmer Text Classification Using Word Embedding and Neural Networks
- URL: http://arxiv.org/abs/2112.06748v1
- Date: Mon, 13 Dec 2021 15:57:32 GMT
- Title: Khmer Text Classification Using Word Embedding and Neural Networks
- Authors: Rina Buoy and Nguonly Taing and Sovisal Chenda
- Abstract summary: We discuss various classification approaches for Khmer text.
A Khmer word embedding model is trained on a 30-million-Khmer-word corpus to construct word vector representations.
We evaluate the performance of different approaches on a news article dataset for both multi-class and multi-label text classification tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text classification is one of the fundamental tasks in natural language
processing to label an open-ended text and is useful for various applications
such as sentiment analysis. In this paper, we discuss various classification
approaches for Khmer text, ranging from a classical TF-IDF algorithm with
support vector machine classifier to modern word embedding-based neural network
classifiers including linear layer model, recurrent neural network and
convolutional neural network. A Khmer word embedding model is trained on a
30-million-Khmer-word corpus to construct word vector representations that are
used to train three different neural network classifiers. We evaluate the
performance of different approaches on a news article dataset for both
multi-class and multi-label text classification tasks. The result suggests that
neural network classifiers using a word embedding model consistently outperform
the traditional classifier using TF-IDF. The recurrent neural network
classifier provides a slightly better result compared to the convolutional
network and the linear layer network.
Related papers
- Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - GNN-LoFI: a Novel Graph Neural Network through Localized Feature-based
Histogram Intersection [51.608147732998994]
Graph neural networks are increasingly becoming the framework of choice for graph-based machine learning.
We propose a new graph neural network architecture that substitutes classical message passing with an analysis of the local distribution of node features.
arXiv Detail & Related papers (2024-01-17T13:04:23Z) - Multi-label Text Classification using GloVe and Neural Network Models [0.27195102129094995]
Existing solutions include traditional machine learning and deep neural networks for predictions.
This paper proposes a method utilizing the bag-of-words model approach based on the GloVe model and the CNN-BiLSTM network.
The method achieves an accuracy rate of 87.26% on the test set and an F1 score of 0.8737, showcasing promising results.
arXiv Detail & Related papers (2023-10-25T01:30:26Z) - T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - Computing Class Hierarchies from Classifiers [12.631679928202516]
We propose a novel algorithm for automatically acquiring a class hierarchy from a neural network.
Our algorithm produces surprisingly good hierarchies for some well-known deep neural network models.
arXiv Detail & Related papers (2021-12-02T13:01:04Z) - Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z) - Does a Hybrid Neural Network based Feature Selection Model Improve Text
Classification? [9.23545668304066]
We propose a hybrid feature selection method for obtaining relevant features.
We then present three ways of implementing a feature selection and neural network pipeline.
We also observed a slight increase in accuracy on some datasets.
arXiv Detail & Related papers (2021-01-22T09:12:19Z) - Provably Training Neural Network Classifiers under Fairness Constraints [70.64045590577318]
We show that overparametrized neural networks could meet the constraints.
Key ingredient of building a fair neural network classifier is establishing no-regret analysis for neural networks.
arXiv Detail & Related papers (2020-12-30T18:46:50Z) - Effect of Word Embedding Models on Hate and Offensive Speech Detection [1.7403133838762446]
We investigate the impact of both word embedding models and neural network architectures on the predictive accuracy.
We first train several word embedding models on a large-scale unlabelled Arabic text corpus.
For each detection task, we train several neural network classifiers using the pre-trained word embedding models.
This task yields a large number of various learned models, which allows conducting an exhaustive comparison.
arXiv Detail & Related papers (2020-11-23T02:43:45Z) - Combine Convolution with Recurrent Networks for Text Classification [12.92202472766078]
We propose a novel method to keep the strengths of the two networks to a great extent.
In the proposed model, a convolutional neural network is applied to learn a 2D weight matrix where each row reflects the importance of each word from different aspects.
We use a bi-directional RNN to process each word and employ a neural tensor layer that fuses forward and backward hidden states to get word representations.
arXiv Detail & Related papers (2020-06-29T03:36:04Z) - Aggregated Learning: A Vector-Quantization Approach to Learning Neural
Network Classifiers [48.11796810425477]
We show that IB learning is, in fact, equivalent to a special class of the quantization problem.
We propose a novel learning framework, "Aggregated Learning", for classification with neural network models.
The effectiveness of this framework is verified through extensive experiments on standard image recognition and text classification tasks.
arXiv Detail & Related papers (2020-01-12T16:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.