Related papers: Assessing Text Classification Methods for Cyberbullying Detection on Social Media Platforms

Assessing Text Classification Methods for Cyberbullying Detection on Social Media Platforms

URL: http://arxiv.org/abs/2412.19928v1
Date: Fri, 27 Dec 2024 21:22:28 GMT
Title: Assessing Text Classification Methods for Cyberbullying Detection on Social Media Platforms
Authors: Adamu Gaston Philipo, Doreen Sebastian Sarwatt, Jianguo Ding, Mahmoud Daneshmand, Huansheng Ning,
Abstract summary: This research aims to adapt and evaluate existing text classification techniques within the cyberbullying detection domain.<n>It focuses on leveraging and assessing large language models, including BERT, RoBERTa, XLNet, DistilBERT, and GPT-2.0.<n>The results show that BERT strikes a balance between performance, time efficiency, and computational resources.
Score: 3.235558067839701
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cyberbullying significantly contributes to mental health issues in communities by negatively impacting the psychology of victims. It is a prevalent problem on social media platforms, necessitating effective, real-time detection and monitoring systems to identify harmful messages. However, current cyberbullying detection systems face challenges related to performance, dataset quality, time efficiency, and computational costs. This research aims to conduct a comparative study by adapting and evaluating existing text classification techniques within the cyberbullying detection domain. The study specifically evaluates the effectiveness and performance of these techniques in identifying cyberbullying instances on social media platforms. It focuses on leveraging and assessing large language models, including BERT, RoBERTa, XLNet, DistilBERT, and GPT-2.0, for their suitability in this domain. The results show that BERT strikes a balance between performance, time efficiency, and computational resources: Accuracy of 95%, Precision of 95%, Recall of 95%, F1 Score of 95%, Error Rate of 5%, Inference Time of 0.053 seconds, RAM Usage of 35.28 MB, CPU/GPU Usage of 0.4%, and Energy Consumption of 0.000263 kWh. The findings demonstrate that generative AI models, while powerful, do not consistently outperform fine-tuned models on the tested benchmarks. However, state-of-the-art performance can still be achieved through strategic adaptation and fine-tuning of existing models for specific datasets and tasks.

Related papers

A Hybrid DeBERTa and Gated Broad Learning System for Cyberbullying Detection in English Text [0.356008609689971]
cyberbullying affects approximately 54.4% of teenagers according to recent research.<n>This paper presents a hybrid architecture that combines the contextual understanding capabilities of transformer-based models with the pattern recognition strengths of broad learning systems for effective cyberbullying detection.
arXiv Detail & Related papers (2025-06-19T06:15:22Z)
Enhancing IoT Cyber Attack Detection in the Presence of Highly Imbalanced Data [0.0]
This study uses hybrid sampling techniques to improve data imbalance detection accuracy in IoT domains.<n>We evaluate the performance of several machine learning models with respect to the classification of cyber-attacks.<n>Overall, this work demonstrates the value of hybrid sampling combined with robust model and feature selection for significantly improving IoT security.
arXiv Detail & Related papers (2025-05-15T14:02:48Z)
CAGN-GAT Fusion: A Hybrid Contrastive Attentive Graph Neural Network for Network Intrusion Detection [0.7067443325368975]
We propose the fusion of a Contrastive Attentive Graph Network and Graph Attention Network (CAGN-GAT Fusion) We benchmark it against 15 other models, including both Graph Neural Networks (GNNs) and traditional ML models. Results show that CAGN-GAT Fusion demonstrates stable and competitive accuracy, recall, and F1-score, even though it does not achieve the highest performance in every dataset.
arXiv Detail & Related papers (2025-03-02T17:01:00Z)
Identifying Cyberbullying Roles in Social Media [3.5568310805420427]
It is critical to accurately detect the roles of individuals involved in cyberbullying incidents to effectively address the issue on a large scale.<n>This study explores the use of machine learning models to detect the roles involved in cyberbullying interactions.
arXiv Detail & Related papers (2024-12-21T00:46:48Z)
Scalable and Effective Negative Sample Generation for Hyperedge Prediction [55.9298019975967]
Hyperedge prediction is crucial for understanding complex multi-entity interactions in web-based applications. Traditional methods often face difficulties in generating high-quality negative samples due to imbalance between positive and negative instances. We present the scalable and effective negative sample generation for Hyperedge Prediction (SEHP) framework, which utilizes diffusion models to tackle these challenges.
arXiv Detail & Related papers (2024-11-19T09:16:25Z)
The Surprising Effectiveness of Test-Time Training for Abstract Reasoning [64.36534512742736]
We investigate the effectiveness of test-time training (TTT) as a mechanism for improving models' reasoning capabilities. TTT significantly improves performance on ARC tasks, achieving up to 6x improvement in accuracy compared to base fine-tuned models. Our findings suggest that explicit symbolic search is not the only path to improved abstract reasoning in neural language models.
arXiv Detail & Related papers (2024-11-11T18:59:45Z)
Optimizing Transformer based on high-performance optimizer for predicting employment sentiment in American social media content [9.49688045612671]
This article improves the Transformer model based on swarm intelligence optimization algorithm, aiming to predict the emotions of employment related text content on American social media. During the training process, the accuracy of the model gradually increased from 49.27% to 82.83%, while the loss value decreased from 0.67 to 0.35. The improved model proposed in this article not only improves the accuracy of sentiment recognition in employment related texts on social media, but also has important practical significance.
arXiv Detail & Related papers (2024-10-09T03:14:05Z)
How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models [95.44559524735308]
Large language or multimodal model based verification has been proposed to scale up online policing mechanisms for mitigating spread of false and harmful content. We test the limits of improving foundation model performance without continual updating through an initial study of knowledge transfer. Our results on two recent multi-modal fact-checking benchmarks, Mocheg and Fakeddit, indicate that knowledge transfer strategies can improve Fakeddit performance over the state-of-the-art by up to 1.7% and Mocheg performance by up to 2.9%.
arXiv Detail & Related papers (2024-06-29T08:39:07Z)
Deep Learning Approaches for Detecting Adversarial Cyberbullying and Hate Speech in Social Networks [0.0]
This paper focuses on detecting cyberbullying in adversarial attack content within social networking site text data, specifically emphasizing hate speech. An LSTM model with a fixed epoch of 100 demonstrated remarkable performance, achieving high accuracy, precision, recall, F1-score, and AUC-ROC scores of 87.57%, 88.73%, 87.57%, 88.15%, and 91% respectively.
arXiv Detail & Related papers (2024-05-30T21:44:15Z)
Investigating the Limitation of CLIP Models: The Worst-Performing Categories [53.360239882501325]
Contrastive Language-Image Pre-training (CLIP) provides a foundation model by integrating natural language into visual concepts. It is usually expected that satisfactory overall accuracy can be achieved across numerous domains through well-designed textual prompts. However, we found that their performance in the worst categories is significantly inferior to the overall performance.
arXiv Detail & Related papers (2023-10-05T05:37:33Z)
One-Shot Learning for Periocular Recognition: Exploring the Effect of Domain Adaptation and Data Bias on Deep Representations [59.17685450892182]
We investigate the behavior of deep representations in widely used CNN models under extreme data scarcity for One-Shot periocular recognition. We improved state-of-the-art results that made use of networks trained with biometric datasets with millions of images. Traditional algorithms like SIFT can outperform CNNs in situations with limited data.
arXiv Detail & Related papers (2023-07-11T09:10:16Z)
Robust Learning with Progressive Data Expansion Against Spurious Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features. Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process. We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z)
Robust Trajectory Prediction against Adversarial Attacks [84.10405251683713]
Trajectory prediction using deep neural networks (DNNs) is an essential component of autonomous driving systems. These methods are vulnerable to adversarial attacks, leading to serious consequences such as collisions. In this work, we identify two key ingredients to defend trajectory prediction models against adversarial attacks.
arXiv Detail & Related papers (2022-07-29T22:35:05Z)
DAPPER: Label-Free Performance Estimation after Personalization for Heterogeneous Mobile Sensing [95.18236298557721]
We present DAPPER (Domain AdaPtation Performance EstimatoR) that estimates the adaptation performance in a target domain with unlabeled target data. Our evaluation with four real-world sensing datasets compared against six baselines shows that DAPPER outperforms the state-of-the-art baseline by 39.8% in estimation accuracy.
arXiv Detail & Related papers (2021-11-22T08:49:33Z)
Bayesian Active Learning for Wearable Stress and Affect Detection [0.7106986689736827]
Stress detection using on-device deep learning algorithms has been on the rise owing to advancements in pervasive computing. In this paper, we propose a framework with capabilities to represent model uncertainties through approximations in Bayesian Neural Networks. Our proposed framework achieves a considerable efficiency boost during inference, with a substantially low number of acquired pool points.
arXiv Detail & Related papers (2020-12-04T16:19:37Z)
Multi-Stage Optimized Machine Learning Framework for Network Intrusion Detection [8.26773636337474]
This paper proposes a novel multi-stage optimized ML-based NIDS framework. It reduces computational complexity while maintaining its detection performance. The proposed framework significantly reduces the required training sample size (up to 74%) and feature set size (up to 50%)
arXiv Detail & Related papers (2020-08-09T03:18:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.