Related papers: U-GIFT: Uncertainty-Guided Firewall for Toxic Speech in Few-Shot Scenario

U-GIFT: Uncertainty-Guided Firewall for Toxic Speech in Few-Shot Scenario

URL: http://arxiv.org/abs/2501.00907v1
Date: Wed, 01 Jan 2025 17:47:22 GMT
Title: U-GIFT: Uncertainty-Guided Firewall for Toxic Speech in Few-Shot Scenario
Authors: Jiaxin Song, Xinyu Wang, Yihao Wang, Yifan Tang, Ru Zhang, Jianyi Liu, Gongshen Liu,
Abstract summary: We propose an uncertainty-guided firewall for toxic speech in few-shot scenarios, U-GIFT.<n>U-GIFT combines active learning with Bayesian Neural Networks (BNNs) to automatically identify high-quality samples from unlabeled data.<n>In the 5-shot setting, it achieves a 14.92% performance improvement over the basic model.
Score: 13.954929026841413
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the widespread use of social media, user-generated content has surged on online platforms. When such content includes hateful, abusive, offensive, or cyberbullying behavior, it is classified as toxic speech, posing a significant threat to the online ecosystem's integrity and safety. While manual content moderation is still prevalent, the overwhelming volume of content and the psychological strain on human moderators underscore the need for automated toxic speech detection. Previously proposed detection methods often rely on large annotated datasets; however, acquiring such datasets is both costly and challenging in practice. To address this issue, we propose an uncertainty-guided firewall for toxic speech in few-shot scenarios, U-GIFT, that utilizes self-training to enhance detection performance even when labeled data is limited. Specifically, U-GIFT combines active learning with Bayesian Neural Networks (BNNs) to automatically identify high-quality samples from unlabeled data, prioritizing the selection of pseudo-labels with higher confidence for training based on uncertainty estimates derived from model predictions. Extensive experiments demonstrate that U-GIFT significantly outperforms competitive baselines in few-shot detection scenarios. In the 5-shot setting, it achieves a 14.92\% performance improvement over the basic model. Importantly, U-GIFT is user-friendly and adaptable to various pre-trained language models (PLMs). It also exhibits robust performance in scenarios with sample imbalance and cross-domain settings, while showcasing strong generalization across various language applications. We believe that U-GIFT provides an efficient solution for few-shot toxic speech detection, offering substantial support for automated content moderation in cyberspace, thereby acting as a firewall to promote advancements in cybersecurity.

Related papers

Not-in-Perspective: Towards Shielding Google's Perspective API Against Adversarial Negation Attacks [1.675857332621569]
cyberbullying has escalated the need for effective ways to monitor and moderate online interactions.<n>Existing solutions of automated toxicity detection systems, are based on a machine or deep learning algorithms.<n>We present a set of formal reasoning-based methodologies that wrap around existing machine learning toxicity detection systems.
arXiv Detail & Related papers (2026-02-10T02:27:28Z)
Offensive Language Detection on Social Media Using XLNet [0.0]
We propose an automatic offensive language detection model based on XLNet, a generalized autoregressive pretraining method, and compare its performance with BERT (Bigressive Representations from Transformers)<n>Our experimental results show that XLNet outperforms BERT in detecting offensive content and in categorizing the types of offenses, while BERT performs slightly better in identifying the targets of the offenses.<n>These findings highlight the potential of transfer learning and XLNet-based architectures to create robust systems for detecting offensive language on social media platforms.
arXiv Detail & Related papers (2025-06-26T22:37:35Z)
Few-shot Hate Speech Detection Based on the MindSpore Framework [2.6396343924017915]
We propose MS-Hate, a prompt-enhanced neural framework for few-shot hate speech detection implemented on the MindSpore deep learning platform. Experimental results on two benchmark datasets-HateXplain and HSOL-demonstrate that our approach outperforms competitive baselines in precision, recall, and F1-score. These findings highlight the potential of combining prompt-based learning with adversarial augmentation for robust and adaptable hate speech detection in few-shot scenarios.
arXiv Detail & Related papers (2025-04-22T15:42:33Z)
Certifying Language Model Robustness with Fuzzed Randomized Smoothing: An Efficient Defense Against Backdoor Attacks [21.930305838969133]
We introduce textbfFuzzed textbfRandomized textbfSmoothing (textbfFRS), a novel approach for efficiently certifying language model robustness against backdoor attacks. Our theoretical analysis demonstrates that FRS achieves a broader certified robustness radius compared to existing methods.
arXiv Detail & Related papers (2025-02-09T12:03:59Z)
Prompt-based Unifying Inference Attack on Graph Neural Networks [24.85661326294946]
We propose a novel Prompt-based unifying Inference Attack framework on Graph neural networks (GNNs)<n>ProIA retains the crucial topological information of the graph during pre-training, enhancing the background knowledge of the inference attack model.<n>It then utilizes a unified prompt and introduces additional disentanglement factors in downstream attacks to adapt to task-relevant knowledge.
arXiv Detail & Related papers (2024-12-20T09:56:17Z)
Scalable and Effective Negative Sample Generation for Hyperedge Prediction [55.9298019975967]
Hyperedge prediction is crucial for understanding complex multi-entity interactions in web-based applications. Traditional methods often face difficulties in generating high-quality negative samples due to imbalance between positive and negative instances. We present the scalable and effective negative sample generation for Hyperedge Prediction (SEHP) framework, which utilizes diffusion models to tackle these challenges.
arXiv Detail & Related papers (2024-11-19T09:16:25Z)
Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks. We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z)
Unleashing the Power of Unlabeled Data: A Self-supervised Learning Framework for Cyber Attack Detection in Smart Grids [6.5023425872686085]
We propose a self-supervised learning-based framework to detect and identify various types of cyber attacks. The proposed framework does not rely on large amounts of well-curated labeled data but makes use of the massive unlabeled data in the wild. Experiment results in a 5-area power grid system with 37 buses demonstrate the superior performance of our framework over existing approaches.
arXiv Detail & Related papers (2024-05-22T20:04:52Z)
Enabling Privacy-Preserving Cyber Threat Detection with Federated Learning [4.475514208635884]
This study systematically profiles the (in)feasibility of learning for privacy-preserving cyber threat detection in terms of effectiveness, byzantine resilience, and efficiency. It shows that FL-trained detection models can achieve a performance that is comparable to centrally trained counterparts. Under a realistic threat model, FL turns out to be adversary-resistant to attacks of both data poisoning and model poisoning.
arXiv Detail & Related papers (2024-04-08T01:16:56Z)
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity [84.6421260559093]
This study is the largest set of experiments to validate, quantify, and expose undocumented intuitions about text pretraining. Our findings indicate there does not exist a one-size-fits-all solution to filtering training data.
arXiv Detail & Related papers (2023-05-22T15:57:53Z)
Adversarial training with informed data selection [53.19381941131439]
Adrial training is the most efficient solution to defend the network against these malicious attacks. This work proposes a data selection strategy to be applied in the mini-batch training. The simulation results show that a good compromise can be obtained regarding robustness and standard accuracy.
arXiv Detail & Related papers (2023-01-07T12:09:50Z)
Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Settings [56.93025161787725]
Federated learning (FL) is a distributed machine learning paradigm that coordinates clients to train a model collaboratively without sharing local data. We propose an attribute inference attack framework that infers sensitive attribute information of the clients from shared gradients or model parameters. We show that the attribute inference attack is achievable for SER systems trained using FL.
arXiv Detail & Related papers (2021-12-26T16:50:42Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference. We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)
Active Fuzzing for Testing and Securing Cyber-Physical Systems [8.228859318969082]
We propose active fuzzing, an automatic approach for finding test suites of packet-level CPS network attacks. Key to our solution is the use of online active learning, which iteratively updates the models by sampling payloads. We evaluate the efficacy of active fuzzing by implementing it for a water purification plant testbed, finding it can automatically discover a test suite of flow, pressure, and over/underflow attacks.
arXiv Detail & Related papers (2020-05-28T16:19:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.