BERT based classification system for detecting rumours on Twitter
- URL: http://arxiv.org/abs/2109.02975v1
- Date: Tue, 7 Sep 2021 10:15:54 GMT
- Title: BERT based classification system for detecting rumours on Twitter
- Authors: Rini Anggrainingsih, Ghulam Mubashar Hassan, Amitava Datta
- Abstract summary: We propose a novel approach to identify rumours on Twitter, rather than the usual feature extraction techniques.
We use sentence embedding using BERT to represent each tweet's sentences into a vector according to the contextual meaning of the tweet.
Our BERT based models improved the accuracy by approximately 10% as compared to previous methods.
- Score: 3.2872586139884623
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The role of social media in opinion formation has far-reaching implications
in all spheres of society. Though social media provide platforms for expressing
news and views, it is hard to control the quality of posts due to the sheer
volumes of posts on platforms like Twitter and Facebook. Misinformation and
rumours have lasting effects on society, as they tend to influence people's
opinions and also may motivate people to act irrationally. It is therefore very
important to detect and remove rumours from these platforms. The only way to
prevent the spread of rumours is through automatic detection and classification
of social media posts. Our focus in this paper is the Twitter social medium, as
it is relatively easy to collect data from Twitter. The majority of previous
studies used supervised learning approaches to classify rumours on Twitter.
These approaches rely on feature extraction to obtain both content and context
features from the text of tweets to distinguish rumours and non-rumours.
Manually extracting features however is time-consuming considering the volume
of tweets. We propose a novel approach to deal with this problem by utilising
sentence embedding using BERT to identify rumours on Twitter, rather than the
usual feature extraction techniques. We use sentence embedding using BERT to
represent each tweet's sentences into a vector according to the contextual
meaning of the tweet. We classify those vectors into rumours or non-rumours by
using various supervised learning techniques. Our BERT based models improved
the accuracy by approximately 10% as compared to previous methods.
Related papers
- ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Which tweets 'deserve' to be included in news stories? Chronemics of
tweet embedding [0.0]
The study focuses on the pressures of immediacy on the media ecosystems.
By analyzing a large corpora of news outlets that have embedded tweets, this study analyzes tweet embedding practices.
We ask two main questions: which types of outlets are quicker to embed tweets, and which types of users' tweets are more likely to be embedded quickly?
arXiv Detail & Related papers (2022-11-16T20:08:35Z) - ViralBERT: A User Focused BERT-Based Approach to Virality Prediction [11.992815669875924]
We propose ViralBERT, which can be used to predict the virality of tweets using content- and user-based features.
We employ a method of concatenating numerical features such as hashtags and follower numbers to tweet text, and utilise two BERT modules.
We collect a dataset of 330k tweets to train ViralBERT and validate the efficacy of our model using baselines from current studies in this field.
arXiv Detail & Related papers (2022-05-17T21:40:24Z) - Manipulating Twitter Through Deletions [64.33261764633504]
Research into influence campaigns on Twitter has mostly relied on identifying malicious activities from tweets obtained via public APIs.
Here, we provide the first exhaustive, large-scale analysis of anomalous deletion patterns involving more than a billion deletions by over 11 million accounts.
We find that a small fraction of accounts delete a large number of tweets daily.
First, limits on tweet volume are circumvented, allowing certain accounts to flood the network with over 26 thousand daily tweets.
Second, coordinated networks of accounts engage in repetitive likes and unlikes of content that is eventually deleted, which can manipulate ranking algorithms.
arXiv Detail & Related papers (2022-03-25T20:07:08Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - What goes on inside rumour and non-rumour tweets and their reactions: A
Psycholinguistic Analyses [58.75684238003408]
psycho-linguistics analyses of social media text are vital for drawing meaningful conclusions to mitigate misinformation.
This research contributes by performing an in-depth psycholinguistic analysis of rumours related to various kinds of events.
arXiv Detail & Related papers (2021-11-09T07:45:11Z) - News consumption and social media regulations policy [70.31753171707005]
We analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation.
Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content.
The lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior.
arXiv Detail & Related papers (2021-06-07T19:26:32Z) - Understanding Information Spreading Mechanisms During COVID-19 Pandemic
by Analyzing the Impact of Tweet Text and User Features for Retweet
Prediction [6.658785818853953]
COVID-19 has affected the world economy and the daily life routine of almost everyone.
Social media platforms enable users to share information with other users who can reshare this information.
We propose two CNN and RNN based models and evaluate the performance of these models on a publicly available TweetsCOV19 dataset.
arXiv Detail & Related papers (2021-05-26T15:55:58Z) - Sentiment Analysis on Social Media Content [0.0]
The aim of this paper is to present a model that can perform sentiment analysis of real data collected from Twitter.
Data in Twitter is highly unstructured which makes it difficult to analyze.
Our proposed model is different from prior work in this field because it combined the use of supervised and unsupervised machine learning algorithms.
arXiv Detail & Related papers (2020-07-04T17:03:30Z) - Heterogeneous Graph Attention Networks for Early Detection of Rumors on
Twitter [9.358510255345676]
False rumors on social media can bring about the panic of the public and damage personal reputation.
We construct a tweet-word-user heterogeneous graph based on the text contents and the source tweet propagations of rumors.
arXiv Detail & Related papers (2020-06-10T14:49:08Z) - Echo Chambers on Social Media: A comparative analysis [64.2256216637683]
We introduce an operational definition of echo chambers and perform a massive comparative analysis on 1B pieces of contents produced by 1M users on four social media platforms.
We infer the leaning of users about controversial topics and reconstruct their interaction networks by analyzing different features.
We find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.
arXiv Detail & Related papers (2020-04-20T20:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.