The Use of a Large Language Model for Cyberbullying Detection
- URL: http://arxiv.org/abs/2402.04088v1
- Date: Tue, 6 Feb 2024 15:46:31 GMT
- Title: The Use of a Large Language Model for Cyberbullying Detection
- Authors: Bayode Ogunleye, Babitha Dharmaraj
- Abstract summary: cyberbullying (CB) is the most prevalent phenomenon in todays cyber world.
It is a severe threat to the mental and physical health of citizens.
This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The dominance of social media has added to the channels of bullying for
perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent
phenomenon in todays cyber world, and is a severe threat to the mental and
physical health of citizens. This opens the need to develop a robust system to
prevent bullying content from online forums, blogs, and social media platforms
to manage the impact in our society. Several machine learning (ML) algorithms
have been proposed for this purpose. However, their performances are not
consistent due to high class imbalance and generalisation issues. In recent
years, large language models (LLMs) like BERT and RoBERTa have achieved
state-of-the-art (SOTA) results in several natural language processing (NLP)
tasks. Unfortunately, the LLMs have not been applied extensively for CB
detection. In our paper, we explored the use of these models for cyberbullying
(CB) detection. We have prepared a new dataset (D2) from existing studies
(Formspring and Twitter). Our experimental results for dataset D1 and D2 showed
that RoBERTa outperformed other models.
Related papers
- Explain Thyself Bully: Sentiment Aided Cyberbullying Detection with
Explanation [52.3781496277104]
Cyberbullying has become a big issue with the popularity of different social media networks and online communication apps.
Recent laws like "right to explanations" of General Data Protection Regulation have spurred research in developing interpretable models.
We develop first interpretable multi-task model called em mExCB for automatic cyberbullying detection from code-mixed languages.
arXiv Detail & Related papers (2024-01-17T07:36:22Z) - Deep Learning Based Cyberbullying Detection in Bangla Language [0.0]
This study demonstrates a deep learning strategy for identifying cyberbullying in Bengali.
A two-layer bidirectional long short-term memory (Bi-LSTM) model has been built to identify cyberbullying.
arXiv Detail & Related papers (2024-01-07T04:58:59Z) - Factuality Challenges in the Era of Large Language Models [113.3282633305118]
Large Language Models (LLMs) generate false, erroneous, or misleading content.
LLMs can be exploited for malicious applications.
This poses a significant challenge to society in terms of the potential deception of users.
arXiv Detail & Related papers (2023-10-08T14:55:02Z) - Cyberbullying Detection for Low-resource Languages and Dialects: Review
of the State of the Art [0.9831489366502298]
There are 23 low-resource languages and dialects covered by this paper, including Bangla, Hindi, Dravidian languages and others.
In the survey, we identify some of the research gaps of previous studies, which include the lack of reliable definitions of cyberbullying.
Based on those proposed suggestions, we collect and release a cyberbullying dataset in the Chittagonian dialect of Bangla.
arXiv Detail & Related papers (2023-08-30T03:52:28Z) - Res-CNN-BiLSTM Network for overcoming Mental Health Disturbances caused
due to Cyberbullying through Social Media [3.1871776847712523]
cyberbullying is done on the basis of Religion, Ethnicity, Age and Gender.
Social media is the medium and it generates massive form of data in textual form.
arXiv Detail & Related papers (2022-04-20T18:40:39Z) - BERTuit: Understanding Spanish language in Twitter through a native
transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets.
Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z) - Analysing Cyberbullying using Natural Language Processing by
Understanding Jargon in Social Media [4.932130498861987]
In our work, we explore binary classification by using a combination of datasets from various social media platforms.
We experiment through multiple models such as Bi-LSTM, GloVe, state-of-the-art models like BERT, and apply a unique preprocessing technique by introducing a slang-abusive corpus.
arXiv Detail & Related papers (2021-04-23T04:20:19Z) - hBert + BiasCorp -- Fighting Racism on the Web [58.768804813646334]
We are releasing BiasCorp, a dataset containing 139,090 comments and news segment from three specific sources - Fox News, BreitbartNews and YouTube.
In this work, we present hBERT, where we modify certain layers of the pretrained BERT model with the new Hopfield Layer.
We are also releasing a JavaScript library and a Chrome Extension Application, to help developers make use of our trained model in web applications.
arXiv Detail & Related papers (2021-04-06T02:17:20Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions.
To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles.
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z) - Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining
Representations for Cyberbullying Classification [4.945634077636197]
We study the nuanced problem of cyberbullying using five explicit factors to represent its social and linguistic aspects.
These results demonstrate the importance of representing and modeling cyberbullying as a social phenomenon.
arXiv Detail & Related papers (2020-04-04T00:35:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.