Related papers: Reducing Unintended Identity Bias in Russian Hate Speech Detection

Reducing Unintended Identity Bias in Russian Hate Speech Detection

URL: http://arxiv.org/abs/2010.11666v1
Date: Thu, 22 Oct 2020 12:54:14 GMT
Title: Reducing Unintended Identity Bias in Russian Hate Speech Detection
Authors: Nadezhda Zueva, Madina Kabirova, Pavel Kalaidin
Abstract summary: This paper describes our efforts towards classifying hate speech in Russian. We propose simple techniques of reducing unintended bias, such as generating training data with language models using terms and words related to protected identities as context.
Score: 0.21485350418225244
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Toxicity has become a grave problem for many online communities and has been growing across many languages, including Russian. Hate speech creates an environment of intimidation, discrimination, and may even incite some real-world violence. Both researchers and social platforms have been focused on developing models to detect toxicity in online communication for a while now. A common problem of these models is the presence of bias towards some words (e.g. woman, black, jew) that are not toxic, but serve as triggers for the classifier due to model caveats. In this paper, we describe our efforts towards classifying hate speech in Russian, and propose simple techniques of reducing unintended bias, such as generating training data with language models using terms and words related to protected identities as context and applying word dropout to such words.

Related papers

A Survey on Automatic Online Hate Speech Detection in Low-Resource Languages [0.5825410941577593]
Social media and easy accessibility of the internet has facilitated the spread of hate speech. This article provides a detailed survey of hate speech detection in low-resource languages around the world.
arXiv Detail & Related papers (2024-11-28T09:42:53Z)
Exploring Large Language Models for Hate Speech Detection in Rioplatense Spanish [0.08192907805418582]
Hate speech detection deals with many language variants, slang, slurs, expression modalities, and cultural nuances. This work presents a brief analysis of the performance of large language models in the detection of Hate Speech for Rioplatense Spanish.
arXiv Detail & Related papers (2024-10-16T02:32:12Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
HCDIR: End-to-end Hate Context Detection, and Intensity Reduction model for online comments [2.162419921663162]
We propose a novel end-to-end model, HCDIR, for Hate Context Detection, and Hate Intensity Reduction in social media posts. We fine-tuned several pre-trained language models to detect hateful comments to ascertain the best-performing hateful comments detection model.
arXiv Detail & Related papers (2023-12-20T17:05:46Z)
Developing Linguistic Patterns to Mitigate Inherent Human Bias in Offensive Language Detection [1.6574413179773761]
We propose a linguistic data augmentation approach to reduce bias in labeling processes. This approach has the potential to improve offensive language classification tasks across multiple languages.
arXiv Detail & Related papers (2023-12-04T10:20:36Z)
CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations. We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z)
Countering Malicious Content Moderation Evasion in Online Social Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems. This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z)
Beyond Plain Toxic: Detection of Inappropriate Statements on Flammable Topics for the Russian Language [76.58220021791955]
We present two text collections labelled according to binary notion of inapropriateness and a multinomial notion of sensitive topic. To objectivise the notion of inappropriateness, we define it in a data-driven way though crowdsourcing.
arXiv Detail & Related papers (2022-03-04T15:59:06Z)
Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages. We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language. We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z)
Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection [75.54119209776894]
We investigate the effect of annotator identities (who) and beliefs (why) on toxic language annotations. We consider posts with three characteristics: anti-Black language, African American English dialect, and vulgarity. Our results show strong associations between annotator identity and beliefs and their ratings of toxicity.
arXiv Detail & Related papers (2021-11-15T18:58:20Z)
One to rule them all: Towards Joint Indic Language Hate Speech Detection [7.296361860015606]
We present a multilingual architecture using state-of-the-art transformer language models to jointly learn hate and offensive speech detection. On the provided testing corpora, we achieve Macro F1 scores of 0.7996, 0.7748, 0.8651 for sub-task 1A and 0.6268, 0.5603 during the fine-grained classification of sub-task 1B.
arXiv Detail & Related papers (2021-09-28T13:30:00Z)
Towards generalisable hate speech detection: a review on obstacles and solutions [6.531659195805749]
This survey paper attempts to summarise how generalisable existing hate speech detection models are. It sums up existing attempts at addressing the main obstacles, and then proposes directions of future research to improve generalisation in hate speech detection.
arXiv Detail & Related papers (2021-02-17T17:27:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.