Mitigating Bias in Conversations: A Hate Speech Classifier and Debiaser
with Prompts
- URL: http://arxiv.org/abs/2307.10213v1
- Date: Fri, 14 Jul 2023 13:33:28 GMT
- Title: Mitigating Bias in Conversations: A Hate Speech Classifier and Debiaser
with Prompts
- Authors: Shaina Raza, Chen Ding, Deval Pandya
- Abstract summary: We propose an approach that involves a two-step process: first, detecting hate speech using a classifier, and then utilizing a debiasing component that generates less biased or unbiased alternatives through prompts.
We evaluated our approach on a benchmark dataset and observed reduction in negativity due to hate speech comments.
- Score: 0.6827423171182153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Discriminatory language and biases are often present in hate speech during
conversations, which usually lead to negative impacts on targeted groups such
as those based on race, gender, and religion. To tackle this issue, we propose
an approach that involves a two-step process: first, detecting hate speech
using a classifier, and then utilizing a debiasing component that generates
less biased or unbiased alternatives through prompts. We evaluated our approach
on a benchmark dataset and observed reduction in negativity due to hate speech
comments. The proposed method contributes to the ongoing efforts to reduce
biases in online discourse and promote a more inclusive and fair environment
for communication.
Related papers
- Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - HateDebias: On the Diversity and Variability of Hate Speech Debiasing [14.225997610785354]
We propose a benchmark, named HateDebias, to analyze the model ability of hate speech detection under continuous, changing environments.
Specifically, to meet the diversity of biases, we collect existing hate speech detection datasets with different types of biases.
We evaluate the detection accuracy of models trained on the datasets with a single type of bias with the performance on the HateDebias, where a significant performance drop is observed.
arXiv Detail & Related papers (2024-06-07T12:18:02Z) - CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a
Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations.
We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z) - Unveiling the Hidden Agenda: Biases in News Reporting and Consumption [59.55900146668931]
We build a six-year dataset on the Italian vaccine debate and adopt a Bayesian latent space model to identify narrative and selection biases.
We found a nonlinear relationship between biases and engagement, with higher engagement for extreme positions.
Analysis of news consumption on Twitter reveals common audiences among news outlets with similar ideological positions.
arXiv Detail & Related papers (2023-01-14T18:58:42Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Hate Speech Classifiers Learn Human-Like Social Stereotypes [4.132204773132937]
Social stereotypes negatively impact individuals' judgements about different groups.
Social stereotypes may have a critical role in how people understand language directed toward minority social groups.
arXiv Detail & Related papers (2021-10-28T01:35:41Z) - Towards generalisable hate speech detection: a review on obstacles and
solutions [6.531659195805749]
This survey paper attempts to summarise how generalisable existing hate speech detection models are.
It sums up existing attempts at addressing the main obstacles, and then proposes directions of future research to improve generalisation in hate speech detection.
arXiv Detail & Related papers (2021-02-17T17:27:48Z) - Annotating for Hate Speech: The MaNeCo Corpus and Some Input from
Critical Discourse Analysis [3.3008315224941978]
This paper presents a novel scheme for the annotation of hate speech in corpora of Web 2.0 commentary.
It is motivated by the critical analysis of posts made in reaction to news reports on the Mediterranean migration crisis and LGBTIQ+ matters in Malta.
We suggest a multi-layer annotation scheme, which is pilot-tested against a binary +/- hate speech classification and appears to yield higher inter-annotator agreement.
arXiv Detail & Related papers (2020-08-14T07:39:21Z) - Towards Debiasing Sentence Representations [109.70181221796469]
We show that Sent-Debias is effective in removing biases, and at the same time, preserves performance on sentence-level downstream tasks.
We hope that our work will inspire future research on characterizing and removing social biases from widely adopted sentence representations for fairer NLP.
arXiv Detail & Related papers (2020-07-16T04:22:30Z) - Demoting Racial Bias in Hate Speech Detection [39.376886409461775]
In current hate speech datasets, there exists a correlation between annotators' perceptions of toxicity and signals of African American English (AAE)
In this paper, we use adversarial training to mitigate this bias, introducing a hate speech classifier that learns to detect toxic sentences while demoting confounds corresponding to AAE texts.
Experimental results on a hate speech dataset and an AAE dataset suggest that our method is able to substantially reduce the false positive rate for AAE text while only minimally affecting the performance of hate speech classification.
arXiv Detail & Related papers (2020-05-25T17:43:22Z) - Towards Controllable Biases in Language Generation [87.89632038677912]
We develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups.
We analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics.
arXiv Detail & Related papers (2020-05-01T08:25:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.