Hate Speech Detection with Generalizable Target-aware Fairness
- URL: http://arxiv.org/abs/2406.00046v2
- Date: Tue, 11 Jun 2024 13:18:14 GMT
- Title: Hate Speech Detection with Generalizable Target-aware Fairness
- Authors: Tong Chen, Danny Wang, Xurong Liang, Marten Risius, Gianluca Demartini, Hongzhi Yin,
- Abstract summary: Generalizable target-aware Fairness (GetFair) is a new method for fairly classifying each post that contains diverse and even unseen targets during inference.
GetFair trains a series of filter functions in an adversarial pipeline, so as to deceive the discriminator.
Experiments on two HSD datasets have shown advantageous performance of GetFair on out-of-sample targets.
- Score: 31.019324291704628
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To counter the side effect brought by the proliferation of social media platforms, hate speech detection (HSD) plays a vital role in halting the dissemination of toxic online posts at an early stage. However, given the ubiquitous topical communities on social media, a trained HSD classifier easily becomes biased towards specific targeted groups (e.g., female and black people), where a high rate of false positive/negative results can significantly impair public trust in the fairness of content moderation mechanisms, and eventually harm the diversity of online society. Although existing fairness-aware HSD methods can smooth out some discrepancies across targeted groups, they are mostly specific to a narrow selection of targets that are assumed to be known and fixed. This inevitably prevents those methods from generalizing to real-world use cases where new targeted groups constantly emerge over time. To tackle this defect, we propose Generalizable target-aware Fairness (GetFair), a new method for fairly classifying each post that contains diverse and even unseen targets during inference. To remove the HSD classifier's spurious dependence on target-related features, GetFair trains a series of filter functions in an adversarial pipeline, so as to deceive the discriminator that recovers the targeted group from filtered post embeddings. To maintain scalability and generalizability, we innovatively parameterize all filter functions via a hypernetwork that is regularized by the semantic affinity among targets. Taking a target's pretrained word embedding as input, the hypernetwork generates the weights used by each target-specific filter on-the-fly without storing dedicated filter parameters. Finally, comparative experiments on two HSD datasets have shown advantageous performance of GetFair on out-of-sample targets.
Related papers
- Fair Deepfake Detectors Can Generalize [51.21167546843708]
We show that controlling for confounders (data distribution and model capacity) enables improved generalization via fairness interventions.<n>Motivated by this insight, we propose Demographic Attribute-insensitive Intervention Detection (DAID), a plug-and-play framework composed of: i) Demographic-aware data rebalancing, which employs inverse-propensity weighting and subgroup-wise feature normalization to neutralize distributional biases; and ii) Demographic-agnostic feature aggregation, which uses a novel alignment loss to suppress sensitive-attribute signals.<n>DAID consistently achieves superior performance in both fairness and generalization compared to several state-of-the-art
arXiv Detail & Related papers (2025-07-03T14:10:02Z) - Fairness-aware Anomaly Detection via Fair Projection [24.68178499460169]
Unsupervised anomaly detection is critical in high-social-impact applications such as finance, healthcare, social media, and cybersecurity.<n>In these scenarios, possible bias from anomaly detection systems can lead to unfair treatment for different groups and even exacerbate social bias.<n>We propose a novel fairness-aware anomaly detection method FairAD.
arXiv Detail & Related papers (2025-05-16T11:26:00Z) - Rethinking Target Label Conditioning in Adversarial Attacks: A 2D Tensor-Guided Generative Approach [26.259289475583522]
Multi-target adversarial attacks have garnered significant attention due to their ability to generate adversarial images for multiple target classes simultaneously.
To address this gap, we first identify and validate that the semantic feature quality and quantity are critical factors affecting the transferability of targeted attacks.
We propose the 2D-TGAF framework, which leverages the powerful generative capabilities of diffusion models to encode target labels into two-dimensional semantic tensors.
arXiv Detail & Related papers (2025-04-19T02:08:48Z) - Pick of the Bunch: Detecting Infrared Small Targets Beyond Hit-Miss Trade-Offs via Selective Rank-Aware Attention [22.580497586948198]
Infrared small target detection faces the inherent challenge of precisely localizing dim targets amidst complex background clutter.
We propose SeRankDet, a deep network that achieves high accuracy beyond the conventional hit-miss trade-off.
arXiv Detail & Related papers (2024-08-07T12:10:32Z) - Fairly Accurate: Optimizing Accuracy Parity in Fair Target-Group Detection [10.104304963621946]
Group Accuracy Parity (GAP) is the first differentiable loss function having a one-on-one mapping to AP.
GAP better mitigates bias in comparison with other commonly employed loss functions.
arXiv Detail & Related papers (2024-07-16T17:23:41Z) - General Adversarial Defense Against Black-box Attacks via Pixel Level
and Feature Level Distribution Alignments [75.58342268895564]
We use Deep Generative Networks (DGNs) with a novel training mechanism to eliminate the distribution gap.
The trained DGNs align the distribution of adversarial samples with clean ones for the target DNNs by translating pixel values.
Our strategy demonstrates its unique effectiveness and generality against black-box attacks.
arXiv Detail & Related papers (2022-12-11T01:51:31Z) - Practical Approaches for Fair Learning with Multitype and Multivariate
Sensitive Attributes [70.6326967720747]
It is important to guarantee that machine learning algorithms deployed in the real world do not result in unfairness or unintended social consequences.
We introduce FairCOCCO, a fairness measure built on cross-covariance operators on reproducing kernel Hilbert Spaces.
We empirically demonstrate consistent improvements against state-of-the-art techniques in balancing predictive power and fairness on real-world datasets.
arXiv Detail & Related papers (2022-11-11T11:28:46Z) - How Biased are Your Features?: Computing Fairness Influence Functions
with Global Sensitivity Analysis [38.482411134083236]
Fairness in machine learning has attained significant focus due to the widespread application in high-stake decision-making tasks.
We introduce the Fairness Influence Function (FIF), which breaks down bias into its components among individual features and the intersection of multiple features.
Experiments demonstrate that FairXplainer captures FIFs of individual feature and intersectional features, provides a better approximation of bias based on FIFs, demonstrates higher correlation of FIFs with fairness interventions, and detects changes in bias due to fairness affirmative/punitive actions in the classifier.
arXiv Detail & Related papers (2022-06-01T04:02:16Z) - Generative multitask learning mitigates target-causing confounding [61.21582323566118]
We propose a simple and scalable approach to causal representation learning for multitask learning.
The improvement comes from mitigating unobserved confounders that cause the targets, but not the input.
Our results on the Attributes of People and Taskonomy datasets reflect the conceptual improvement in robustness to prior probability shift.
arXiv Detail & Related papers (2022-02-08T20:42:14Z) - PARL: Enhancing Diversity of Ensemble Networks to Resist Adversarial
Attacks via Pairwise Adversarially Robust Loss Function [13.417003144007156]
adversarial attacks tend to rely on the principle of transferability.
Ensemble methods against adversarial attacks demonstrate that an adversarial example is less likely to mislead multiple classifiers.
Recent ensemble methods have either been shown to be vulnerable to stronger adversaries or shown to lack an end-to-end evaluation.
arXiv Detail & Related papers (2021-12-09T14:26:13Z) - CD-UAP: Class Discriminative Universal Adversarial Perturbation [83.60161052867534]
A single universal adversarial perturbation (UAP) can be added to all natural images to change most of their predicted class labels.
We propose a new universal attack method to generate a single perturbation that fools a target network to misclassify only a chosen group of classes.
arXiv Detail & Related papers (2020-10-07T09:26:42Z) - Double Targeted Universal Adversarial Perturbations [83.60161052867534]
We introduce a double targeted universal adversarial perturbations (DT-UAPs) to bridge the gap between the instance-discriminative image-dependent perturbations and the generic universal perturbations.
We show the effectiveness of the proposed DTA algorithm on a wide range of datasets and also demonstrate its potential as a physical attack.
arXiv Detail & Related papers (2020-10-07T09:08:51Z) - Mitigating Face Recognition Bias via Group Adaptive Classifier [53.15616844833305]
This work aims to learn a fair face representation, where faces of every group could be more equally represented.
Our work is able to mitigate face recognition bias across demographic groups while maintaining the competitive accuracy.
arXiv Detail & Related papers (2020-06-13T06:43:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.