Efficient Hate Speech Detection: Evaluating 38 Models from Traditional Methods to Transformers
- URL: http://arxiv.org/abs/2509.14266v1
- Date: Sun, 14 Sep 2025 21:17:04 GMT
- Title: Efficient Hate Speech Detection: Evaluating 38 Models from Traditional Methods to Transformers
- Authors: Mahmoud Abusaqer, Jamil Saquer, Hazim Shatnawi,
- Abstract summary: This study evaluates 38 model configurations in detecting hate speech across datasets ranging from 6.5K to 451K samples.<n>Our results show that transformers, particularly RoBERTa, consistently achieve superior performance with accuracy and F1-scores exceeding 90%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The proliferation of hate speech on social media necessitates automated detection systems that balance accuracy with computational efficiency. This study evaluates 38 model configurations in detecting hate speech across datasets ranging from 6.5K to 451K samples. We analyze transformer architectures (e.g., BERT, RoBERTa, Distil-BERT), deep neural networks (e.g., CNN, LSTM, GRU, Hierarchical Attention Networks), and traditional machine learning methods (e.g., SVM, CatBoost, Random Forest). Our results show that transformers, particularly RoBERTa, consistently achieve superior performance with accuracy and F1-scores exceeding 90%. Among deep learning approaches, Hierarchical Attention Networks yield the best results, while traditional methods like CatBoost and SVM remain competitive, achieving F1-scores above 88% with significantly lower computational costs. Additionally, our analysis highlights the importance of dataset characteristics, with balanced, moderately sized unprocessed datasets outperforming larger, preprocessed datasets. These findings offer valuable insights for developing efficient and effective hate speech detection systems.
Related papers
- AI Generated Text Detection [0.0]
This paper presents an evaluation of AI text detection methods, including both traditional machine learning models and transformer-based architectures.<n>We utilize two datasets, HC3 and DAIGT v2, to build a unified benchmark and apply a topic-based data split to prevent information leakage.<n>Results indicate that contextual modeling is significantly superior to lexical features and highlight the importance of mitigating topic memorization.
arXiv Detail & Related papers (2026-01-07T11:18:10Z) - Hyperparameter Tuning-Based Optimized Performance Analysis of Machine Learning Algorithms for Network Intrusion Detection [0.22940141855172033]
Network Intrusion Systems (NIDS) are essential for securing networks by identifying and mitigating unauthorized activities.<n>This study explores the application of machine learning (ML) methods to improve the NIDS accuracy.
arXiv Detail & Related papers (2025-12-14T15:02:48Z) - CSI-4CAST: A Hybrid Deep Learning Model for CSI Prediction with Comprehensive Robustness and Generalization Testing [44.045995554758385]
This paper introduces CSI-4CAST, a hybrid deep learning architecture that integrates 4 key components, i.e., Convolutional neural network residuals, Adaptive correction layers, ShuffleNet blocks, and Transformers.<n>The dataset spans multiple channel models, a wide range of delay spreads and user velocities, and diverse noise types and intensity degrees.<n> Experimental results show that CSI-4CAST achieves superior prediction accuracy with substantially lower computational cost.
arXiv Detail & Related papers (2025-10-14T21:19:52Z) - Exploring the Relationship between Brain Hemisphere States and Frequency Bands through Deep Learning Optimization Techniques [3.966519779235704]
This study investigates performance across EEG frequency bands using various convolutions and evaluates efficient class prediction for the left and right hemispheres.<n>Adagrad and RMSprops consistently perform well across different frequency bands, with Adadelta exhibiting robust performance in cross-model evaluations.<n>The deep dense network shows competitive performance in learning complex patterns, whereas the shallow three-layer network, sometimes being less accurate, provides computational efficiency.
arXiv Detail & Related papers (2025-09-17T15:26:45Z) - CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language Models [42.12079243701232]
Causal Attention Tuning (CAT) is a novel approach that injects fine-grained causal knowledge into the attention mechanism.<n>We propose an automated pipeline that leverages human priors to automatically generate token-level causal signals.<n>Cat achieves an average improvement of 5.76% on the STG dataset and 1.56% on downstream tasks.
arXiv Detail & Related papers (2025-09-01T15:13:15Z) - Amplifying Emotional Signals: Data-Efficient Deep Learning for Robust Speech Emotion Recognition [0.0]
Speech Emotion Recognition (SER) presents a significant yet persistent challenge in human-computer interaction.<n>We develop and evaluate a suite of machine learning models, including Support Vector Machines (SVMs), Long Short-Term Memory networks (LSTMs), and Convolutional Neural Networks (CNNs)<n>We demonstrate that by strategically employing transfer learning and innovative data augmentation techniques, our models can achieve impressive performance despite the constraints of a relatively small dataset.
arXiv Detail & Related papers (2025-08-26T19:08:54Z) - Efficient Fault Detection in WSN Based on PCA-Optimized Deep Neural Network Slicing Trained with GOA [0.6827423171182154]
Traditional fault detection methods often struggle with optimizing deep neural networks (DNNs) for efficient performance.<n>This study proposes a novel hybrid method combining Principal Component Analysis (PCA) with a DNN optimized by the Grasshopper Optimization Algorithm (GOA) to address these limitations.<n>Our approach achieves a remarkable 99.72% classification accuracy, with exceptional precision and recall, outperforming conventional methods.
arXiv Detail & Related papers (2025-05-11T15:51:56Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Evaluating raw waveforms with deep learning frameworks for speech
emotion recognition [0.0]
We represent a model, which feeds raw audio files directly into the deep neural networks without any feature extraction stage.
We use six different data sets, EMO-DB, RAVDESS, TESS, CREMA, SAVEE, and TESS+RAVDESS.
The proposed model performs 90.34% of accuracy for EMO-DB with CNN model, 90.42% of accuracy for RAVDESS, 99.48% of accuracy for TESS with LSTM model, 69.72% of accuracy for CREMA with CNN model, 85.76% of accuracy for SAVEE with CNN model in
arXiv Detail & Related papers (2023-07-06T07:27:59Z) - Transformer-based approaches to Sentiment Detection [55.41644538483948]
We examined the performance of four different types of state-of-the-art transformer models for text classification.
The RoBERTa transformer model performs best on the test dataset with a score of 82.6% and is highly recommended for quality predictions.
arXiv Detail & Related papers (2023-03-13T17:12:03Z) - From Environmental Sound Representation to Robustness of 2D CNN Models
Against Adversarial Attacks [82.21746840893658]
This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network.
We show that while the ResNet-18 model trained on DWT spectrograms achieves a high recognition accuracy, attacking this model is relatively more costly for the adversary.
arXiv Detail & Related papers (2022-04-14T15:14:08Z) - SignalNet: A Low Resolution Sinusoid Decomposition and Estimation
Network [79.04274563889548]
We propose SignalNet, a neural network architecture that detects the number of sinusoids and estimates their parameters from quantized in-phase and quadrature samples.
We introduce a worst-case learning threshold for comparing the results of our network relative to the underlying data distributions.
In simulation, we find that our algorithm is always able to surpass the threshold for three-bit data but often cannot exceed the threshold for one-bit data.
arXiv Detail & Related papers (2021-06-10T04:21:20Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z) - A Survey on Impact of Transient Faults on BNN Inference Accelerators [0.9667631210393929]
Big data booming enables us to easily access and analyze the highly large data sets.
Deep learning models require significant computation power and extremely high memory accesses.
In this study, we demonstrate that the impact of soft errors on a customized deep learning algorithm might cause drastic image misclassification.
arXiv Detail & Related papers (2020-04-10T16:15:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.