Sentiment analysis in tweets: an assessment study from classical to
modern text representation models
- URL: http://arxiv.org/abs/2105.14373v1
- Date: Sat, 29 May 2021 21:05:28 GMT
- Title: Sentiment analysis in tweets: an assessment study from classical to
modern text representation models
- Authors: S\'ergio Barreto, Ricardo Moura, Jonnathan Carvalho, Aline Paes,
Alexandre Plastino
- Abstract summary: Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
- Score: 59.107260266206445
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: With the growth of social medias, such as Twitter, plenty of user-generated
data emerge daily. The short texts published on Twitter -- the tweets -- have
earned significant attention as a rich source of information to guide many
decision-making processes. However, their inherent characteristics, such as the
informal, and noisy linguistic style, remain challenging to many natural
language processing (NLP) tasks, including sentiment analysis. Sentiment
classification is tackled mainly by machine learning-based classifiers. The
literature has adopted word representations from distinct natures to transform
tweets to vector-based inputs to feed sentiment classifiers. The
representations come from simple count-based methods, such as bag-of-words, to
more sophisticated ones, such as BERTweet, built upon the trendy BERT
architecture. Nevertheless, most studies mainly focus on evaluating those
models using only a small number of datasets. Despite the progress made in
recent years in language modelling, there is still a gap regarding a robust
evaluation of induced embeddings applied to sentiment analysis on tweets.
Furthermore, while fine-tuning the model from downstream tasks is prominent
nowadays, less attention has been given to adjustments based on the specific
linguistic style of the data. In this context, this study fulfils an assessment
of existing language models in distinguishing the sentiment expressed in tweets
by using a rich collection of 22 datasets from distinct domains and five
classification algorithms. The evaluation includes static and contextualized
representations. Contexts are assembled from Transformer-based autoencoder
models that are also fine-tuned based on the masked language model task, using
a plethora of strategies.
Related papers
- CAST: Corpus-Aware Self-similarity Enhanced Topic modelling [16.562349140796115]
We introduce CAST: Corpus-Aware Self-similarity Enhanced Topic modelling, a novel topic modelling method.
We find self-similarity to be an effective metric to prevent functional words from acting as candidate topic words.
Our approach significantly enhances the coherence and diversity of generated topics, as well as the topic model's ability to handle noisy data.
arXiv Detail & Related papers (2024-10-19T15:27:11Z) - Assessing In-context Learning and Fine-tuning for Topic Classification of German Web Data [3.2771631221674333]
We model the detection of topic-related content as a binary classification task.
Using only a few hundred annotated data points per topic, we detect content related to three German policies.
arXiv Detail & Related papers (2024-07-23T14:31:59Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - TweetEval: Unified Benchmark and Comparative Evaluation for Tweet
Classification [22.265865542786084]
We propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks.
Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models.
arXiv Detail & Related papers (2020-10-23T14:11:04Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Probing Contextual Language Models for Common Ground with Visual
Representations [76.05769268286038]
We design a probing model that evaluates how effective are text-only representations in distinguishing between matching and non-matching visual representations.
Our findings show that language representations alone provide a strong signal for retrieving image patches from the correct object categories.
Visually grounded language models slightly outperform text-only language models in instance retrieval, but greatly under-perform humans.
arXiv Detail & Related papers (2020-05-01T21:28:28Z) - Adapting Deep Learning for Sentiment Classification of Code-Switched
Informal Short Text [1.6752182911522517]
We present a labeled dataset called MultiSenti for sentiment classification of code-switched informal short text.
We propose a deep learning-based model for sentiment classification of code-switched informal short text.
arXiv Detail & Related papers (2020-01-04T06:31:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.