A Comparison of Lexicon-Based and ML-Based Sentiment Analysis: Are There
Outlier Words?
- URL: http://arxiv.org/abs/2311.06221v1
- Date: Fri, 10 Nov 2023 18:21:50 GMT
- Title: A Comparison of Lexicon-Based and ML-Based Sentiment Analysis: Are There
Outlier Words?
- Authors: Siddhant Jaydeep Mahajani and Shashank Srivastava and Alan F. Smeaton
- Abstract summary: In this paper we compute sentiment for more than 150,000 English language texts drawn from 4 domains.
We model differences in sentiment scores between approaches for documents in each domain using a regression.
Our findings are that the importance of a word depends on the domain and there are no standout lexical entries which systematically cause differences in sentiment scores.
- Score: 14.816706893177997
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Lexicon-based approaches to sentiment analysis of text are based on each word
or lexical entry having a pre-defined weight indicating its sentiment polarity.
These are usually manually assigned but the accuracy of these when compared
against machine leaning based approaches to computing sentiment, are not known.
It may be that there are lexical entries whose sentiment values cause a
lexicon-based approach to give results which are very different to a machine
learning approach. In this paper we compute sentiment for more than 150,000
English language texts drawn from 4 domains using the Hedonometer, a
lexicon-based technique and Azure, a contemporary machine-learning based
approach which is part of the Azure Cognitive Services family of APIs which is
easy to use. We model differences in sentiment scores between approaches for
documents in each domain using a regression and analyse the independent
variables (Hedonometer lexical entries) as indicators of each word's importance
and contribution to the score differences. Our findings are that the importance
of a word depends on the domain and there are no standout lexical entries which
systematically cause differences in sentiment scores.
Related papers
- Lexicon-Based Sentiment Analysis on Text Polarities with Evaluation of Classification Models [1.342834401139078]
This work uses a lexicon-based method to perform sentiment analysis and shows an evaluation of classification models trained over textual data.
The lexicon-based methods identify the intensity of emotion and subjectivity at word levels.
This work is based on a multi-class problem of text being labeled as positive, negative, or neutral.
arXiv Detail & Related papers (2024-09-19T15:31:12Z) - Towards Unsupervised Recognition of Token-level Semantic Differences in
Related Documents [61.63208012250885]
We formulate recognizing semantic differences as a token-level regression task.
We study three unsupervised approaches that rely on a masked language model.
Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels.
arXiv Detail & Related papers (2023-05-22T17:58:04Z) - Sentiment-Aware Word and Sentence Level Pre-training for Sentiment
Analysis [64.70116276295609]
SentiWSP is a Sentiment-aware pre-trained language model with combined Word-level and Sentence-level Pre-training tasks.
SentiWSP achieves new state-of-the-art performance on various sentence-level and aspect-level sentiment classification benchmarks.
arXiv Detail & Related papers (2022-10-18T12:25:29Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - Lex2Sent: A bagging approach to unsupervised sentiment analysis [0.628122931748758]
In this paper, we propose an alternative approach to classifying texts: Lex2Sent.
To classify texts, we train embedding models to determine the distances between document embeddings and the embeddings of a suitable lexicon.
We show that our model outperforms lexica and provides a basis for a high performing few-shot fine-tuning approach in the task of binary sentiment analysis.
arXiv Detail & Related papers (2022-09-26T20:49:18Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Building domain specific lexicon based on TikTok comment dataset [0.0]
Previous research focused on sentiment analysis in English, for example, analyzing the sentiment tendency of sentences based on Valence, Arousal, Dominance of sentences.
This paper tried a method that builds a domain-specific lexicon.
The model can classify Chinese words with emotional tendency.
arXiv Detail & Related papers (2020-12-16T07:26:43Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - A Variational Approach to Unsupervised Sentiment Analysis [8.87759101018566]
We propose a variational approach to unsupervised sentiment analysis.
We use target-opinion word pairs as a supervision signal.
We apply our method to sentiment analysis on customer reviews and clinical narratives.
arXiv Detail & Related papers (2020-08-21T09:52:35Z) - A computational model implementing subjectivity with the 'Room Theory'.
The case of detecting Emotion from Text [68.8204255655161]
This work introduces a new method to consider subjectivity and general context dependency in text analysis.
By using similarity measure between words, we are able to extract the relative relevance of the elements in the benchmark.
This method could be applied to all the cases where evaluating subjectivity is relevant to understand the relative value or meaning of a text.
arXiv Detail & Related papers (2020-05-12T21:26:04Z) - Detecting Domain Polarity-Changes of Words in a Sentiment Lexicon [24.818142279945633]
Many sentiment words are domain dependent. That is, they may be positive in some domains but negative in some others.
We propose a graph-based technique to tackle this problem.
Experimental results show its effectiveness on multiple real-world datasets.
arXiv Detail & Related papers (2020-04-29T17:35:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.