The emojification of sentiment on social media: Collection and analysis
of a longitudinal Twitter sentiment dataset
- URL: http://arxiv.org/abs/2108.13898v1
- Date: Tue, 31 Aug 2021 14:54:46 GMT
- Title: The emojification of sentiment on social media: Collection and analysis
of a longitudinal Twitter sentiment dataset
- Authors: Wenjie Yin, Rabab Alkhalifa, Arkaitz Zubiaga
- Abstract summary: TM-Senti is a new large-scale, distantly supervised Twitter sentiment dataset with over 184 million tweets.
We describe and assess our methodology to put together a large-scale, emoticon- and emoji-based labelled sentiment analysis dataset.
Our analysis highlights interesting temporal changes, among others in the increasing use of emojis over emoticons.
- Score: 5.528896840956628
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social media, as a means for computer-mediated communication, has been
extensively used to study the sentiment expressed by users around events or
topics. There is however a gap in the longitudinal study of how sentiment
evolved in social media over the years. To fill this gap, we develop TM-Senti,
a new large-scale, distantly supervised Twitter sentiment dataset with over 184
million tweets and covering a time period of over seven years. We describe and
assess our methodology to put together a large-scale, emoticon- and emoji-based
labelled sentiment analysis dataset, along with an analysis of the resulting
dataset. Our analysis highlights interesting temporal changes, among others in
the increasing use of emojis over emoticons. We publicly release the dataset
for further research in tasks including sentiment analysis and text
classification of tweets. The dataset can be fully rehydrated including tweet
metadata and without missing tweets thanks to the archive of tweets publicly
available on the Internet Archive, which the dataset is based on.
Related papers
- Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B.
We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively.
We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z) - An LSTM model for Twitter Sentiment Analysis [0.0]
We have collected seven publicly available and manually annotated twitter sentiment datasets.
We create a new training and testing dataset from the collected datasets.
We develop an LSTM model to classify sentiment of a tweet and evaluate the model with the new dataset.
arXiv Detail & Related papers (2022-12-04T10:42:46Z) - Decay No More: A Persistent Twitter Dataset for Learning Social Meaning [10.227026799075215]
We propose a new persistent English Twitter dataset for social meaning (PTSM)
PTSM consists of $17$ social meaning datasets in $10$ categories of tasks.
We experiment with two SOTA pre-trained language models and show that our PTSM can substitute the actual tweets with paraphrases with marginal performance loss.
arXiv Detail & Related papers (2022-04-10T06:07:54Z) - Extracting Feelings of People Regarding COVID-19 by Social Network
Mining [0.0]
dataset of COVID-related tweets in English language is collected.
More than two million tweets from March 23 to June 23 of 2020 are analyzed.
arXiv Detail & Related papers (2021-10-12T16:45:33Z) - Exploiting BERT For Multimodal Target SentimentClassification Through
Input Space Translation [75.82110684355979]
We introduce a two-stream model that translates images in input space using an object-aware transformer.
We then leverage the translation to construct an auxiliary sentence that provides multimodal information to a language model.
We achieve state-of-the-art performance on two multimodal Twitter datasets.
arXiv Detail & Related papers (2021-08-03T18:02:38Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Understanding the Hoarding Behaviors during the COVID-19 Pandemic using
Large Scale Social Media Data [77.34726150561087]
We analyze the hoarding and anti-hoarding patterns of over 42,000 unique Twitter users in the United States from March 1 to April 30, 2020.
We find the percentage of females in both hoarding and anti-hoarding groups is higher than that of the general Twitter user population.
The LIWC anxiety mean for the hoarding-related tweets is significantly higher than the baseline Twitter anxiety mean.
arXiv Detail & Related papers (2020-10-15T16:02:25Z) - Tweets Sentiment Analysis via Word Embeddings and Machine Learning
Techniques [1.345251051985899]
This paper aims to perform sentiment analysis of real-time 2019 election twitter data using the feature selection model word2vec and the machine learning algorithm random forest for sentiment classification.
Word2vec improves the quality of features by considering contextual semantics of words in a text hence improving the accuracy of machine learning and sentiment analysis.
arXiv Detail & Related papers (2020-07-05T08:10:30Z) - Neural Temporal Opinion Modelling for Opinion Prediction on Twitter [42.87769996249732]
We design a topic-driven attention mechanism to capture the dynamic topic shifts in the neighbourhood context.
Experimental results show that the proposed model predicts both the posting time and the stance labels of future tweets more accurately than a number of competitive baselines.
arXiv Detail & Related papers (2020-05-27T16:49:04Z) - Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline [47.434392695347924]
RecSys 2020 Challenge organized by ACM RecSys in partnership with Twitter using this dataset.
This paper touches on the key challenges faced by researchers and professionals striving to predict user engagements.
arXiv Detail & Related papers (2020-04-28T23:54:33Z) - That Message Went Viral?! Exploratory Analytics and Sentiment Analysis
into the Propagation of Tweets [0.0]
We conducted an exploratory analysis on a dataset of over a million Tweets.
We identified the most popular messages, and analyzed the tweets on multiple endogenous dimensions.
We found some interesting patterns and uncovered new insights to help researchers and practitioners better understand the behavior of popular viral tweets.
arXiv Detail & Related papers (2020-04-21T02:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.