Developing Linguistic Patterns to Mitigate Inherent Human Bias in
Offensive Language Detection
- URL: http://arxiv.org/abs/2312.01787v1
- Date: Mon, 4 Dec 2023 10:20:36 GMT
- Title: Developing Linguistic Patterns to Mitigate Inherent Human Bias in
Offensive Language Detection
- Authors: Toygar Tanyel, Besher Alkurdi, Serkan Ayvaz
- Abstract summary: We propose a linguistic data augmentation approach to reduce bias in labeling processes.
This approach has the potential to improve offensive language classification tasks across multiple languages.
- Score: 1.6574413179773761
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: With the proliferation of social media, there has been a sharp increase in
offensive content, particularly targeting vulnerable groups, exacerbating
social problems such as hatred, racism, and sexism. Detecting offensive
language use is crucial to prevent offensive language from being widely shared
on social media. However, the accurate detection of irony, implication, and
various forms of hate speech on social media remains a challenge. Natural
language-based deep learning models require extensive training with large,
comprehensive, and labeled datasets. Unfortunately, manually creating such
datasets is both costly and error-prone. Additionally, the presence of
human-bias in offensive language datasets is a major concern for deep learning
models. In this paper, we propose a linguistic data augmentation approach to
reduce bias in labeling processes, which aims to mitigate the influence of
human bias by leveraging the power of machines to improve the accuracy and
fairness of labeling processes. This approach has the potential to improve
offensive language classification tasks across multiple languages and reduce
the prevalence of offensive content on social media.
Related papers
- Beyond Hate Speech: NLP's Challenges and Opportunities in Uncovering
Dehumanizing Language [11.946719280041789]
This paper evaluates the performance of cutting-edge NLP models, including GPT-4, GPT-3.5, and LLAMA-2 in identifying dehumanizing language.
Our findings reveal that while these models demonstrate potential, achieving a 70% accuracy rate in distinguishing dehumanizing language from broader hate speech, they also display biases.
arXiv Detail & Related papers (2024-02-21T13:57:36Z) - Hate Speech and Offensive Language Detection using an Emotion-aware
Shared Encoder [1.8734449181723825]
Existing works on hate speech and offensive language detection produce promising results based on pre-trained transformer models.
This paper addresses a multi-task joint learning approach which combines external emotional features extracted from another corpora.
Our findings demonstrate that emotional knowledge helps to more reliably identify hate speech and offensive language across datasets.
arXiv Detail & Related papers (2023-02-17T09:31:06Z) - KOLD: Korean Offensive Language Dataset [11.699797031874233]
We present a Korean offensive language dataset (KOLD), 40k comments labeled with offensiveness, target, and targeted group information.
We show that title information serves as context and is helpful to discern the target of hatred, especially when they are omitted in the comment.
arXiv Detail & Related papers (2022-05-23T13:58:45Z) - On The Robustness of Offensive Language Classifiers [10.742675209112623]
Social media platforms are deploying machine learning based offensive language classification systems to combat hateful, racist, and other forms of offensive speech at scale.
We study the robustness of state-of-the-art offensive language classifiers against more crafty adversarial attacks.
Our results show that these crafty adversarial attacks can degrade the accuracy of offensive language classifiers by more than 50% while also being able to preserve the readability and meaning of the modified text.
arXiv Detail & Related papers (2022-03-21T20:44:30Z) - On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors.
We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z) - COLD: A Benchmark for Chinese Offensive Language Detection [54.60909500459201]
We use COLDataset, a Chinese offensive language dataset with 37k annotated sentences.
We also propose textscCOLDetector to study output offensiveness of popular Chinese language models.
Our resources and analyses are intended to help detoxify the Chinese online communities and evaluate the safety performance of generative language models.
arXiv Detail & Related papers (2022-01-16T11:47:23Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Semi-automatic Generation of Multilingual Datasets for Stance Detection
in Twitter [9.359018642178917]
This paper presents a method to obtain multilingual datasets for stance detection in Twitter.
We leverage user-based information to semi-automatically label large amounts of tweets.
arXiv Detail & Related papers (2021-01-28T13:05:09Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.