Presence of informal language, such as emoticons, hashtags, and slang,
impact the performance of sentiment analysis models on social media text?
- URL: http://arxiv.org/abs/2301.12303v1
- Date: Sat, 28 Jan 2023 22:21:51 GMT
- Title: Presence of informal language, such as emoticons, hashtags, and slang,
impact the performance of sentiment analysis models on social media text?
- Authors: Aadil Gani Ganie
- Abstract summary: This study investigated the influence of informal language, such as emoticons and slang, on the performance of sentiment analysis models applied to social media text.
A CNN model was developed and trained on three datasets: a sarcasm dataset, a sentiment dataset, and an emoticon dataset.
The results revealed that the model achieved an accuracy of 96.47% on the sarcasm dataset, with the lowest accuracy for class 1.
The amalgamation of sarcasm and sentiment datasets improved the accuracy of the model to 95.1%, and the addition of emoticon dataset has a slight positive impact on the accuracy of the model to 95.37%.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study aimed to investigate the influence of the presence of informal
language, such as emoticons and slang, on the performance of sentiment analysis
models applied to social media text. A convolutional neural network (CNN) model
was developed and trained on three datasets: a sarcasm dataset, a sentiment
dataset, and an emoticon dataset. The model architecture was held constant for
all experiments and the model was trained on 80% of the data and tested on 20%.
The results revealed that the model achieved an accuracy of 96.47% on the
sarcasm dataset, with the lowest accuracy for class 1. On the sentiment
dataset, the model achieved an accuracy of 95.28%. The amalgamation of sarcasm
and sentiment datasets improved the accuracy of the model to 95.1%, and the
addition of emoticon dataset has a slight positive impact on the accuracy of
the model to 95.37%. The study suggests that the presence of informal language
has a restricted impact on the performance of sentiment analysis models applied
to social media text. However, the inclusion of emoticon data to the model can
enhance the accuracy slightly.
Related papers
- Optimizing Transformer based on high-performance optimizer for predicting employment sentiment in American social media content [9.49688045612671]
This article improves the Transformer model based on swarm intelligence optimization algorithm, aiming to predict the emotions of employment related text content on American social media.
During the training process, the accuracy of the model gradually increased from 49.27% to 82.83%, while the loss value decreased from 0.67 to 0.35.
The improved model proposed in this article not only improves the accuracy of sentiment recognition in employment related texts on social media, but also has important practical significance.
arXiv Detail & Related papers (2024-10-09T03:14:05Z) - Phrasing for UX: Enhancing Information Engagement through Computational Linguistics and Creative Analytics [0.0]
This study explores the relationship between textual features and Information Engagement (IE) on digital platforms.
It highlights the impact of computational linguistics and analytics on user interaction.
The READ model is introduced to quantify key predictors like representativeness, ease of use, affect, and distribution.
arXiv Detail & Related papers (2024-08-23T00:33:47Z) - Text Sentiment Analysis and Classification Based on Bidirectional Gated Recurrent Units (GRUs) Model [6.096738978232722]
This paper explores the importance of text sentiment analysis and classification in the field of natural language processing.
It proposes a new approach to sentiment analysis and classification based on the bidirectional gated recurrent units (GRUs) model.
arXiv Detail & Related papers (2024-04-26T02:40:03Z) - Evaluating Large Language Models Using Contrast Sets: An Experimental Approach [0.0]
We introduce an innovative technique for generating a contrast set for the Stanford Natural Language Inference dataset.
Our strategy involves the automated substitution of verbs, adverbs, and adjectives with their synonyms to preserve the original meaning of sentences.
This method aims to assess whether a model's performance is based on genuine language comprehension or simply on pattern recognition.
arXiv Detail & Related papers (2024-04-02T02:03:28Z) - ECRC: Emotion-Causality Recognition in Korean Conversation for GCN [0.0]
We propose the emotion-causality recognition in conversation (ECRC) model, which is based on a novel graph structure.
In this study, we overcome the limitations of previous embeddings by utilizing both word- and sentence-level embeddings.
This model uniquely integrates the bidirectional long short-term memory (Bi-LSTM) and graph neural network (GCN) models for Korean conversation analysis.
arXiv Detail & Related papers (2024-03-16T02:07:31Z) - Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals [91.59906995214209]
We propose a new evaluation method, Counterfactual Attentiveness Test (CAT)
CAT uses counterfactuals by replacing part of the input with its counterpart from a different example, expecting an attentive model to change its prediction.
We show that GPT3 becomes less attentive with an increased number of demonstrations, while its accuracy on the test data improves.
arXiv Detail & Related papers (2023-11-16T06:27:35Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Retrieval-based Disentangled Representation Learning with Natural
Language Supervision [61.75109410513864]
We present Vocabulary Disentangled Retrieval (VDR), a retrieval-based framework that harnesses natural language as proxies of the underlying data variation to drive disentangled representation learning.
Our approach employ a bi-encoder model to represent both data and natural language in a vocabulary space, enabling the model to distinguish intrinsic dimensions that capture characteristics within data through its natural language counterpart, thus disentanglement.
arXiv Detail & Related papers (2022-12-15T10:20:42Z) - Scaling Language Models: Methods, Analysis & Insights from Training
Gopher [83.98181046650664]
We present an analysis of Transformer-based language model performance across a wide range of model scales.
Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language.
We discuss the application of language models to AI safety and the mitigation of downstream harms.
arXiv Detail & Related papers (2021-12-08T19:41:47Z) - Can x2vec Save Lives? Integrating Graph and Language Embeddings for
Automatic Mental Health Classification [91.3755431537592]
I show how merging graph and language embedding models (metapath2vec and doc2vec) avoids resource limits.
When integrated, both data produce highly accurate predictions (90%, with 10% false-positives and 12% false-negatives)
These results extend research on the importance of simultaneously analyzing behavior and language in massive networks.
arXiv Detail & Related papers (2020-01-04T20:56:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.