Efficient Hate Speech Detection: A Three-Layer LoRA-Tuned BERTweet Framework
- URL: http://arxiv.org/abs/2511.06051v1
- Date: Sat, 08 Nov 2025 15:47:18 GMT
- Title: Efficient Hate Speech Detection: A Three-Layer LoRA-Tuned BERTweet Framework
- Authors: Mahmoud El-Bahnasawi,
- Abstract summary: This paper addresses the challenge of developing computationally efficient hate speech detection systems.<n>We propose a novel three-layer framework that combines rule-based pre-filtering with a parameter-efficient LoRA-tuned BERTweet model.<n>Our approach achieves 94% of the performance of state-of-the-art large language models like SafePhi.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper addresses the critical challenge of developing computationally efficient hate speech detection systems that maintain competitive performance while being practical for real-time deployment. We propose a novel three-layer framework that combines rule-based pre-filtering with a parameter-efficient LoRA-tuned BERTweet model and continuous learning capabilities. Our approach achieves 0.85 macro F1 score - representing 94% of the performance of state-of-the-art large language models like SafePhi (Phi-4 based) while using a base model that is 100x smaller (134M vs 14B parameters). Compared to traditional BERT-based approaches with similar computational requirements, our method demonstrates superior performance through strategic dataset unification and optimized fine-tuning. The system requires only 1.87M trainable parameters (1.37% of full fine-tuning) and trains in approximately 2 hours on a single T4 GPU, making robust hate speech detection accessible in resource-constrained environments while maintaining competitive accuracy for real-world deployment.
Related papers
- Chain of Simulation: A Dual-Mode Reasoning Framework for Large Language Models with Dynamic Problem Routing [0.0]
Chain of Simulation (CoS) is a novel dual-mode reasoning framework that dynamically routes problems to specialized reasoning strategies.<n>CoS employs three distinct reasoning modes: computational flow with self-consistency for mathematical problems, symbolic state tracking with representations for spatial reasoning, and hybrid fact-extraction for multi-hop inference.
arXiv Detail & Related papers (2026-02-02T21:44:01Z) - Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain [0.0]
This paper presents Mecellem models, a framework for developing specialized language models for the Turkish legal domain.<n>We make two contributions: (1)Encoder Model Pre-trained from Scratch: ModernBERT-based bidirectional encoders pre-trained on a Turkish-dominant corpus of 112.7 billion tokens; and (2)Decoder Model with Continual Pre-training (CPT): Qwen3-1.7B and Qwen3-4B models adapted to Turkish legal domain through controlled curriculum learning.
arXiv Detail & Related papers (2026-01-22T14:41:32Z) - Teaching Language Models to Reason with Tools [73.21700643314917]
We present emphHint-Engineering, a new data synthesis strategy that strategically injects diverse hints at optimal points within reasoning paths.<n>CoRT significantly enhances efficiency, reducing token usage by approximately 30% for the 32B model and 50% for the 1.5B model.
arXiv Detail & Related papers (2025-10-23T08:41:44Z) - Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model [100.86587937568832]
Ring-1T is the first open-source, state-of-the-art thinking model with a trillion-scale parameter.<n>It features 1 trillion total parameters and activates approximately 50 billion per token.
arXiv Detail & Related papers (2025-10-21T17:46:14Z) - Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation [3.313347968067735]
We propose a workflow for speech emotion recognition using pre-trained representations and HPO strategies.<n>Experiments run on 8 CPU cores with 32 GB RAM.<n>For cross-lingual generalisation, an EmoDB-trained HPO-tuned model improves zero-shot accuracy by 0.25 on CREMA-D and 0.26 on RAVDESS.
arXiv Detail & Related papers (2025-10-08T14:20:43Z) - Systematic Optimization of Open Source Large Language Models for Mathematical Reasoning [1.8254074486719114]
This paper presents a practical investigation into fine-tuning model parameters for mathematical reasoning tasks.<n>A holistically optimized framework is introduced for five state-of-the-art models on mathematical reasoning tasks.
arXiv Detail & Related papers (2025-09-08T21:31:43Z) - Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z) - EfficientLLM: Efficiency in Large Language Models [64.3537131208038]
Large Language Models (LLMs) have driven significant progress, yet their growing counts and context windows incur prohibitive compute, energy, and monetary costs.<n>We introduce EfficientLLM, a novel benchmark and the first comprehensive empirical study evaluating efficiency techniques for LLMs at scale.
arXiv Detail & Related papers (2025-05-20T02:27:08Z) - Learning Adaptive Parallel Reasoning with Language Models [70.1745752819628]
We propose Adaptive Parallel Reasoning (APR), a novel reasoning framework that enables language models to orchestrate both serialized and parallel computations end-to-end.<n> APR generalizes existing reasoning methods by enabling adaptive multi-threaded inference using spawn() and join() operations.<n>A key innovation is our end-to-end reinforcement learning strategy, optimizing both parent and child inference threads to enhance task success rate without requiring predefined reasoning structures.
arXiv Detail & Related papers (2025-04-21T22:29:02Z) - The Surprising Effectiveness of Test-Time Training for Few-Shot Learning [59.309477460893916]
Language models (LMs) have shown impressive performance on tasks within their training distribution, but often struggle with structurally novel tasks.<n>We investigate the effectiveness of test-time training (TTT) as a mechanism for improving LMs' reasoning and few-shot learning capabilities.<n>Our findings highlight the limitations of in-context learning for novel tasks and demonstrate the potential of test-time training to enhance language model adaptability.
arXiv Detail & Related papers (2024-11-11T18:59:45Z) - End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames [55.72994484532856]
temporal action detection (TAD) has seen significant performance improvement with end-to-end training.
Due to the memory bottleneck, only models with limited scales and limited data volumes can afford end-to-end training.
We reduce the memory consumption for end-to-end training, and manage to scale up the TAD backbone to 1 billion parameters and the input video to 1,536 frames.
arXiv Detail & Related papers (2023-11-28T21:31:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.