ViralBERT: A User Focused BERT-Based Approach to Virality Prediction
- URL: http://arxiv.org/abs/2206.10298v1
- Date: Tue, 17 May 2022 21:40:24 GMT
- Title: ViralBERT: A User Focused BERT-Based Approach to Virality Prediction
- Authors: Rikaz Rameez, Hossein A. Rahmani, Emine Yilmaz
- Abstract summary: We propose ViralBERT, which can be used to predict the virality of tweets using content- and user-based features.
We employ a method of concatenating numerical features such as hashtags and follower numbers to tweet text, and utilise two BERT modules.
We collect a dataset of 330k tweets to train ViralBERT and validate the efficacy of our model using baselines from current studies in this field.
- Score: 11.992815669875924
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, Twitter has become the social network of choice for sharing and
spreading information to a multitude of users through posts called 'tweets'.
Users can easily re-share these posts to other users through 'retweets', which
allow information to cascade to many more users, increasing its outreach.
Clearly, being able to know the extent to which a post can be retweeted has
great value in advertising, influencing and other such campaigns. In this paper
we propose ViralBERT, which can be used to predict the virality of tweets using
content- and user-based features. We employ a method of concatenating numerical
features such as hashtags and follower numbers to tweet text, and utilise two
BERT modules: one for semantic representation of the combined text and
numerical features, and another module purely for sentiment analysis of text,
as both the information within text and it's ability to elicit an emotional
response play a part in retweet proneness. We collect a dataset of 330k tweets
to train ViralBERT and validate the efficacy of our model using baselines from
current studies in this field. Our experiments show that our approach
outperforms these baselines, with a 13% increase in both F1 Score and Accuracy
compared to the best performing baseline method. We then undergo an ablation
study to investigate the importance of chosen features, finding that text
sentiment and follower counts, and to a lesser extent mentions and following
counts, are the strongest features for the model, and that hashtag counts are
detrimental to the model.
Related papers
- ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Retweet-BERT: Political Leaning Detection Using Language Features and
Information Diffusion on Social Networks [30.143148646797265]
We introduce Retweet-BERT, a simple and scalable model to estimate the political leanings of Twitter users.
Our assumptions stem from patterns of networks and linguistics homophily among people who share similar ideologies.
arXiv Detail & Related papers (2022-07-18T02:18:20Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - Exploiting BERT For Multimodal Target SentimentClassification Through
Input Space Translation [75.82110684355979]
We introduce a two-stream model that translates images in input space using an object-aware transformer.
We then leverage the translation to construct an auxiliary sentence that provides multimodal information to a language model.
We achieve state-of-the-art performance on two multimodal Twitter datasets.
arXiv Detail & Related papers (2021-08-03T18:02:38Z) - News consumption and social media regulations policy [70.31753171707005]
We analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation.
Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content.
The lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior.
arXiv Detail & Related papers (2021-06-07T19:26:32Z) - Understanding Information Spreading Mechanisms During COVID-19 Pandemic
by Analyzing the Impact of Tweet Text and User Features for Retweet
Prediction [6.658785818853953]
COVID-19 has affected the world economy and the daily life routine of almost everyone.
Social media platforms enable users to share information with other users who can reshare this information.
We propose two CNN and RNN based models and evaluate the performance of these models on a publicly available TweetsCOV19 dataset.
arXiv Detail & Related papers (2021-05-26T15:55:58Z) - Towards A Sentiment Analyzer for Low-Resource Languages [0.0]
This research aims to analyse a sentiment of the users towards a particular trending topic that has been actively and massively discussed at that time.
We use the hashtag textit#kpujangancurang that was the trending topic during the Indonesia presidential election in 2019.
This research utilizes rapid miner tool to generate the twitter data and comparing Naive Bayes, K-Nearest Neighbor, Decision Tree, and Multi-Layer Perceptron classification methods to classify the sentiment of the twitter data.
arXiv Detail & Related papers (2020-11-12T13:50:00Z) - Named Entity Recognition for Social Media Texts with Semantic
Augmentation [70.44281443975554]
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts.
We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
arXiv Detail & Related papers (2020-10-29T10:06:46Z) - TweetBERT: A Pretrained Language Representation Model for Twitter Text
Analysis [0.0]
We introduce two TweetBERT models, which are domain specific language presentation models, pre-trained on millions of tweets.
We show that the TweetBERT models significantly outperform the traditional BERT models in Twitter text mining tasks by more than 7% on each Twitter dataset.
arXiv Detail & Related papers (2020-10-17T00:45:02Z) - Writer Identification Using Microblogging Texts for Social Media
Forensics [53.180678723280145]
We evaluate popular stylometric features, widely used in literary analysis, and specific Twitter features like URLs, hashtags, replies or quotes.
We test varying sized author sets and varying amounts of training/test texts per author.
arXiv Detail & Related papers (2020-07-31T00:23:18Z) - Sentiment Analysis on Social Media Content [0.0]
The aim of this paper is to present a model that can perform sentiment analysis of real data collected from Twitter.
Data in Twitter is highly unstructured which makes it difficult to analyze.
Our proposed model is different from prior work in this field because it combined the use of supervised and unsupervised machine learning algorithms.
arXiv Detail & Related papers (2020-07-04T17:03:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.