Lexicon-based Methods vs. BERT for Text Sentiment Analysis
- URL: http://arxiv.org/abs/2111.10097v1
- Date: Fri, 19 Nov 2021 08:47:32 GMT
- Title: Lexicon-based Methods vs. BERT for Text Sentiment Analysis
- Authors: Anastasia Kotelnikova, Danil Paschenko, Klavdiya Bochenina, Evgeny
Kotelnikov
- Abstract summary: SO-CAL and SentiStrength lexicon-based methods adapted for the Russian language.
RuBERT outperforms both lexicon-based methods on average, but SO-CAL surpasses RuBERT for four corpora out of 16.
- Score: 0.15293427903448023
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The performance of sentiment analysis methods has greatly increased in recent
years. This is due to the use of various models based on the Transformer
architecture, in particular BERT. However, deep neural network models are
difficult to train and poorly interpretable. An alternative approach is
rule-based methods using sentiment lexicons. They are fast, require no
training, and are well interpreted. But recently, due to the widespread use of
deep learning, lexicon-based methods have receded into the background. The
purpose of the article is to study the performance of the SO-CAL and
SentiStrength lexicon-based methods, adapted for the Russian language. We have
tested these methods, as well as the RuBERT neural network model, on 16 text
corpora and have analyzed their results. RuBERT outperforms both lexicon-based
methods on average, but SO-CAL surpasses RuBERT for four corpora out of 16.
Related papers
- Fuzzy Fingerprinting Transformer Language-Models for Emotion Recognition
in Conversations [0.7874708385247353]
We propose to combine the two approaches to perform Emotion Recognition in Conversations (ERC)
We feed utterances and their previous conversational turns to a pre-trained RoBERTa, obtaining contextual embedding utterance representations.
We validate our approach on the widely used DailyDialog ERC benchmark dataset.
arXiv Detail & Related papers (2023-09-08T12:26:01Z) - Pre-trained Embeddings for Entity Resolution: An Experimental Analysis
[Experiment, Analysis & Benchmark] [65.11858854040544]
We perform a thorough experimental analysis of 12 popular language models over 17 established benchmark datasets.
First, we assess their vectorization overhead for converting all input entities into dense embeddings vectors.
Second, we investigate their blocking performance, performing a detailed scalability analysis, and comparing them with the state-of-the-art deep learning-based blocking method.
Third, we conclude with their relative performance for both supervised and unsupervised matching.
arXiv Detail & Related papers (2023-04-24T08:53:54Z) - Transformer-based approaches to Sentiment Detection [55.41644538483948]
We examined the performance of four different types of state-of-the-art transformer models for text classification.
The RoBERTa transformer model performs best on the test dataset with a score of 82.6% and is highly recommended for quality predictions.
arXiv Detail & Related papers (2023-03-13T17:12:03Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - BERT-Based Combination of Convolutional and Recurrent Neural Network for
Indonesian Sentiment Analysis [0.0]
This research extends the previous hybrid deep learning using BERT representation for Indonesian sentiment analysis.
Our simulation shows that the BERT representation improves the accuracies of all hybrid architectures.
arXiv Detail & Related papers (2022-11-10T00:32:40Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - KinyaBERT: a Morphology-aware Kinyarwanda Language Model [1.2183405753834562]
Unsupervised sub-word tokenization methods are sub-optimal at handling morphologically rich languages.
We propose a simple yet effective two-tier BERT architecture that leverages a morphological analyzer and explicitly represents morphological compositionality.
We evaluate our proposed method on the low-resource morphologically rich Kinyarwanda language, naming the proposed model architecture KinyaBERT.
arXiv Detail & Related papers (2022-03-16T08:36:14Z) - Does BERT look at sentiment lexicon? [0.0]
We study the attention weights matrices of the Russian-language RuBERT model.
We fine-tune RuBERT on sentiment text corpora and compare the distributions of attention weights for sentiment and neutral lexicons.
arXiv Detail & Related papers (2021-11-19T08:50:48Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Deep Unfolding Network for Image Super-Resolution [159.50726840791697]
This paper proposes an end-to-end trainable unfolding network which leverages both learning-based methods and model-based methods.
The proposed network inherits the flexibility of model-based methods to super-resolve blurry, noisy images for different scale factors via a single model.
arXiv Detail & Related papers (2020-03-23T17:55:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.