A Weakly Supervised Classifier and Dataset of White Supremacist Language
- URL: http://arxiv.org/abs/2306.15732v1
- Date: Tue, 27 Jun 2023 18:19:32 GMT
- Title: A Weakly Supervised Classifier and Dataset of White Supremacist Language
- Authors: Michael Miller Yoder, Ahmad Diab, David West Brown, Kathleen M. Carley
- Abstract summary: We present a dataset and classifier for detecting the language of white supremacist extremism.
Our weakly supervised classifier is trained on large datasets of text from explicitly white supremacist domains paired with neutral and anti-racist data.
- Score: 6.893512627479197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a dataset and classifier for detecting the language of white
supremacist extremism, a growing issue in online hate speech. Our weakly
supervised classifier is trained on large datasets of text from explicitly
white supremacist domains paired with neutral and anti-racist data from similar
domains. We demonstrate that this approach improves generalization performance
to new domains. Incorporating anti-racist texts as counterexamples to white
supremacist language mitigates bias.
Related papers
- Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings.
We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z) - LAHM : Large Annotated Dataset for Multi-Domain and Multilingual Hate
Speech Identification [2.048680519934008]
We present a new multilingual hate speech analysis dataset for English, Hindi, Arabic, French, German and Spanish languages.
This paper is the first to address the problem of identifying various types of hate speech in these five wide domains in these six languages.
arXiv Detail & Related papers (2023-04-03T12:03:45Z) - On The Robustness of Offensive Language Classifiers [10.742675209112623]
Social media platforms are deploying machine learning based offensive language classification systems to combat hateful, racist, and other forms of offensive speech at scale.
We study the robustness of state-of-the-art offensive language classifiers against more crafty adversarial attacks.
Our results show that these crafty adversarial attacks can degrade the accuracy of offensive language classifiers by more than 50% while also being able to preserve the readability and meaning of the modified text.
arXiv Detail & Related papers (2022-03-21T20:44:30Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Reducing Target Group Bias in Hate Speech Detectors [56.94616390740415]
We show that text classification models trained on large publicly available datasets, may significantly under-perform on several protected groups.
We propose to perform token-level hate sense disambiguation, and utilize tokens' hate sense representations for detection.
arXiv Detail & Related papers (2021-12-07T17:49:34Z) - Mitigating Racial Biases in Toxic Language Detection with an
Equity-Based Ensemble Framework [9.84413545378636]
Recent research has demonstrated how racial biases against users who write African American English exist in popular toxic language datasets.
We propose additional descriptive fairness metrics to better understand the source of these biases.
We show that our proposed framework substantially reduces the racial biases that the model learns from these datasets.
arXiv Detail & Related papers (2021-09-27T15:54:05Z) - Detecting White Supremacist Hate Speech using Domain Specific Word
Embedding with Deep Learning and BERT [0.0]
White supremacist hate speech is one of the most recently observed harmful content on social media.
This research investigates the viability of automatically detecting white supremacist hate speech on Twitter by using deep learning and natural language processing techniques.
arXiv Detail & Related papers (2020-10-01T12:44:24Z) - Hate Speech Detection and Racial Bias Mitigation in Social Media based
on BERT model [1.9336815376402716]
We introduce a transfer learning approach for hate speech detection based on an existing pre-trained language model called BERT.
We evaluate the proposed model on two publicly available datasets annotated for racism, sexism, hate or offensive content on Twitter.
arXiv Detail & Related papers (2020-08-14T16:47:25Z) - It's Morphin' Time! Combating Linguistic Discrimination with
Inflectional Perturbations [68.16751625956243]
Only perfect Standard English corpora predisposes neural networks to discriminate against minorities from non-standard linguistic backgrounds.
We perturb the inflectional morphology of words to craft plausible and semantically similar adversarial examples.
arXiv Detail & Related papers (2020-05-09T04:01:43Z) - Adversarial Augmentation Policy Search for Domain and Cross-Lingual
Generalization in Reading Comprehension [96.62963688510035]
Reading comprehension models often overfit to nuances of training datasets and fail at adversarial evaluation.
We present several effective adversaries and automated data augmentation policy search methods with the goal of making reading comprehension models more robust to adversarial evaluation.
arXiv Detail & Related papers (2020-04-13T17:20:08Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.