Word Embedding Techniques for Classification of Star Ratings
- URL: http://arxiv.org/abs/2504.13653v1
- Date: Fri, 18 Apr 2025 12:26:28 GMT
- Title: Word Embedding Techniques for Classification of Star Ratings
- Authors: Hesham Abdelmotaleb, Craig McNeile, Malgorzata Wojtys,
- Abstract summary: This research uses a novel dataset of telecom customers' reviews to perform an extensive study showing how different word embedding algorithms can affect the text classification process.<n>Several state-of-the-art word embedding techniques are considered, including BERT, Word2Vec and Doc2Vec, coupled with several classification algorithms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Telecom services are at the core of today's societies' everyday needs. The availability of numerous online forums and discussion platforms enables telecom providers to improve their services by exploring the views of their customers to learn about common issues that the customers face. Natural Language Processing (NLP) tools can be used to process the free text collected. One way of working with such data is to represent text as numerical vectors using one of many word embedding models based on neural networks. This research uses a novel dataset of telecom customers' reviews to perform an extensive study showing how different word embedding algorithms can affect the text classification process. Several state-of-the-art word embedding techniques are considered, including BERT, Word2Vec and Doc2Vec, coupled with several classification algorithms. The important issue of feature engineering and dimensionality reduction is addressed and several PCA-based approaches are explored. Moreover, the energy consumption used by the different word embeddings is investigated. The findings show that some word embedding models can lead to consistently better text classifiers in terms of precision, recall and F1-Score. In particular, for the more challenging classification tasks, BERT combined with PCA stood out with the highest performance metrics. Moreover, our proposed PCA approach of combining word vectors using the first principal component shows clear advantages in performance over the traditional approach of taking the average.
Related papers
- A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching [60.51839859852572]
We propose to resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models.
We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM.
arXiv Detail & Related papers (2024-03-05T13:55:16Z) - Recent Advances in Named Entity Recognition: A Comprehensive Survey and Comparative Study [8.91661466156389]
We present an overview of recent popular approaches to NER.<n>We discuss reinforcement learning and graph-based approaches, highlighting their role in enhancing NER performance.<n>We evaluate the performance of the main NER implementations on a variety of datasets with differing characteristics.
arXiv Detail & Related papers (2024-01-19T17:21:05Z) - A Process for Topic Modelling Via Word Embeddings [0.0]
This work combines algorithms based on word embeddings, dimensionality reduction, and clustering.
The objective is to obtain topics from a set of unclassified texts.
arXiv Detail & Related papers (2023-10-06T15:10:35Z) - Description-Enhanced Label Embedding Contrastive Learning for Text
Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task.
Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.
external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z) - Embedding generation for text classification of Brazilian Portuguese
user reviews: from bag-of-words to transformers [0.0]
This study includes from classical (Bag-of-Words) to state-of-the-art (Transformer-based) NLP models.
It aims to provide a comprehensive experimental study of embedding approaches targeting a binary sentiment classification of user reviews in Brazilian Portuguese.
arXiv Detail & Related papers (2022-12-01T15:24:19Z) - Using virtual edges to extract keywords from texts modeled as complex
networks [0.1611401281366893]
We modeled texts co-occurrence networks, where nodes are words and edges are established by contextual or semantical similarity.
We found that, in fact, the use of virtual edges can improve the discriminability of co-occurrence networks.
arXiv Detail & Related papers (2022-05-04T16:43:03Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - A Comparison of Word2Vec, HMM2Vec, and PCA2Vec for Malware
Classification [3.0969191504482247]
We first consider multiple different word embedding techniques within the context of malware classification.
We derive feature embeddings based on opcode sequences for malware samples from a variety of different families.
We show that we can obtain better classification accuracy based on these feature embeddings.
arXiv Detail & Related papers (2021-03-07T14:41:18Z) - Be More with Less: Hypergraph Attention Networks for Inductive Text
Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task.
Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words.
We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z) - Pretrained Transformers for Text Ranking: BERT and Beyond [53.83210899683987]
This survey provides an overview of text ranking with neural network architectures known as transformers.
The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing.
arXiv Detail & Related papers (2020-10-13T15:20:32Z) - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP)
In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.