An NLP-Assisted Bayesian Time Series Analysis for Prevalence of Twitter
Cyberbullying During the COVID-19 Pandemic
- URL: http://arxiv.org/abs/2208.04980v1
- Date: Sat, 23 Jul 2022 15:24:07 GMT
- Title: An NLP-Assisted Bayesian Time Series Analysis for Prevalence of Twitter
Cyberbullying During the COVID-19 Pandemic
- Authors: Christopher Perez, Sayar Karmakar
- Abstract summary: 1 million tweets containing keywords associated with cyberbullying were collected from the beginning of 2019 to the end of 2021.
A natural language processing model pre-trained on a Twitter corpus generated probabilities for the tweets being offensive and hateful.
Results reveal strong weekly and yearly seasonality in hateful speech but with slight differences across years that may be attributed to COVID-19.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: COVID-19 has brought about many changes in social dynamics. Stay-at-home
orders and disruptions in school teaching can influence bullying behavior
in-person and online, both of which leading to negative outcomes in victims. To
study cyberbullying specifically, 1 million tweets containing keywords
associated with abuse were collected from the beginning of 2019 to the end of
2021 with the Twitter API search endpoint. A natural language processing model
pre-trained on a Twitter corpus generated probabilities for the tweets being
offensive and hateful. To overcome limitations of sampling, data was also
collected using the count endpoint. The fraction of tweets from a given daily
sample marked as abusive is multiplied to the number reported by the count
endpoint. Once these adjusted counts are assembled, a Bayesian autoregressive
Poisson model allows one to study the mean trend and lag functions of the data
and how they vary over time. The results reveal strong weekly and yearly
seasonality in hateful speech but with slight differences across years that may
be attributed to COVID-19.
Related papers
- Sentiment Analysis of Cyberbullying Data in Social Media [0.0]
Our work focuses on leveraging deep learning and natural language understanding techniques to detect traces of bullying in social media posts.
One approach utilizes BERT embeddings, while the other replaces the embeddings layer with the recently released embeddings API from OpenAI.
We conducted a performance comparison between these two approaches to evaluate their effectiveness in sentiment analysis of Formspring Cyberbullying data.
arXiv Detail & Related papers (2024-11-08T20:41:04Z) - CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster
Tweet Classification [51.58605842457186]
We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting.
Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
arXiv Detail & Related papers (2023-10-23T07:01:09Z) - Understanding writing style in social media with a supervised
contrastively pre-trained transformer [57.48690310135374]
Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation.
We introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 106 authored texts.
Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy.
arXiv Detail & Related papers (2023-10-17T09:01:17Z) - Manipulating Twitter Through Deletions [64.33261764633504]
Research into influence campaigns on Twitter has mostly relied on identifying malicious activities from tweets obtained via public APIs.
Here, we provide the first exhaustive, large-scale analysis of anomalous deletion patterns involving more than a billion deletions by over 11 million accounts.
We find that a small fraction of accounts delete a large number of tweets daily.
First, limits on tweet volume are circumvented, allowing certain accounts to flood the network with over 26 thousand daily tweets.
Second, coordinated networks of accounts engage in repetitive likes and unlikes of content that is eventually deleted, which can manipulate ranking algorithms.
arXiv Detail & Related papers (2022-03-25T20:07:08Z) - A deep dive into the consistently toxic 1% of Twitter [9.669275987983447]
This study spans 14 years of tweets from 122K Twitter profiles and more than 293M tweets.
We selected the most extreme profiles in terms of consistency of toxic content and examined their tweet texts, and the domains, hashtags, and URLs they shared.
We found that these selected profiles keep to a narrow theme with lower diversity in hashtags, URLs, and domains, they are thematically similar to each other, and have a high likelihood of bot-like behavior.
arXiv Detail & Related papers (2022-02-16T04:21:48Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - Probabilistic Impact Score Generation using Ktrain-BERT to Identify Hate
Words from Twitter Discussions [0.5735035463793008]
This paper presents experimentation with a Keras wrapped lightweight BERT model to successfully identify hate speech.
The dataset used for this task is the Hate Speech and Offensive Content Detection (HASOC 2021) data from FIRE 2021 in English.
Our system obtained a validation accuracy of 82.60%, with a maximum F1-Score of 82.68%.
arXiv Detail & Related papers (2021-11-25T06:35:49Z) - Evaluating the Impact of COVID-19 on Cyberbullying through Bayesian
Trend Analysis [1.52292571922932]
cyberbullying related public tweets (N=454,046) posted between January 1st, 2020 -- June 7th, 2020.
We show that this new Bayesian method can clearly show the upward trend on cyberbullying-related tweets since mid-March 2020.
Our work emphasizes a critical issue of cyberbullying and how a global crisis impacts social media abuse.
arXiv Detail & Related papers (2020-08-08T00:01:32Z) - Change-Point Analysis of Cyberbullying-Related Twitter Discussions
During COVID-19 [1.2891210250935146]
An increase in social media usage has also been observed, leading to the suspicion that this has also raised cyberbullying.
To evaluate this trend, we collected 454,046 cyberbullying-related public tweets posted between January 1st, 2020 -- June 7th, 2020.
Almost all these changepoint time-locations can be attributed to COVID-19, which substantiates our initial hypothesis of an increase in cyberbullying through analysis of discussions over Twitter.
arXiv Detail & Related papers (2020-08-07T22:50:42Z) - Writer Identification Using Microblogging Texts for Social Media
Forensics [53.180678723280145]
We evaluate popular stylometric features, widely used in literary analysis, and specific Twitter features like URLs, hashtags, replies or quotes.
We test varying sized author sets and varying amounts of training/test texts per author.
arXiv Detail & Related papers (2020-07-31T00:23:18Z) - Detecting Perceived Emotions in Hurricane Disasters [62.760131661847986]
We introduce HurricaneEmo, an emotion dataset of 15,000 English tweets spanning three hurricanes: Harvey, Irma, and Maria.
We present a comprehensive study of fine-grained emotions and propose classification tasks to discriminate between coarse-grained emotion groups.
arXiv Detail & Related papers (2020-04-29T16:17:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.