Tackling Hate Speech in Low-resource Languages with Context Experts
- URL: http://arxiv.org/abs/2303.16828v1
- Date: Wed, 29 Mar 2023 16:24:22 GMT
- Title: Tackling Hate Speech in Low-resource Languages with Context Experts
- Authors: Daniel Nkemelu, Harshil Shah, Irfan Essa, Michael L. Best
- Abstract summary: This paper presents findings from our remote study on the automatic detection of hate speech online in Myanmar.
We argue that effectively addressing this problem will require community-based approaches that combine the knowledge of context experts with machine learning tools.
- Score: 7.5217405965075095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given Myanmars historical and socio-political context, hate speech spread on
social media has escalated into offline unrest and violence. This paper
presents findings from our remote study on the automatic detection of hate
speech online in Myanmar. We argue that effectively addressing this problem
will require community-based approaches that combine the knowledge of context
experts with machine learning tools that can analyze the vast amount of data
produced. To this end, we develop a systematic process to facilitate this
collaboration covering key aspects of data collection, annotation, and model
validation strategies. We highlight challenges in this area stemming from small
and imbalanced datasets, the need to balance non-glamorous data work and
stakeholder priorities, and closed data-sharing practices. Stemming from these
findings, we discuss avenues for further work in developing and deploying hate
speech detection systems for low-resource languages.
Related papers
- Empirical Evaluation of Public HateSpeech Datasets [0.0]
Social media platforms are widely utilised for generating datasets employed in training and evaluating machine learning algorithms for hate speech detection.
Existing public datasets exhibit numerous limitations, hindering the effective training of these algorithms and leading to inaccurate hate speech classification.
This work aims to advance the development of more accurate and reliable machine learning models for hate speech detection.
arXiv Detail & Related papers (2024-06-27T11:20:52Z) - MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection [2.433983268807517]
Hate speech poses significant social, psychological, and occasionally physical threats to targeted individuals and communities.
Current computational linguistic approaches for tackling this phenomenon rely on labelled social media datasets for training.
We scrutinized over 60 datasets, selectively integrating those pertinent into MetaHate.
Our findings contribute to a deeper understanding of the existing datasets, paving the way for training more robust and adaptable models.
arXiv Detail & Related papers (2024-01-12T11:54:53Z) - Hate Speech Detection in Limited Data Contexts using Synthetic Data
Generation [1.9506923346234724]
We propose a data augmentation approach that addresses the problem of lack of data for online hate speech detection in limited data contexts.
We present three methods to synthesize new examples of hate speech data in a target language that retains the hate sentiment in the original examples but transfers the hate targets.
Our findings show that a model trained on synthetic data performs comparably to, and in some cases outperforms, a model trained only on the samples available in the target domain.
arXiv Detail & Related papers (2023-10-04T15:10:06Z) - Uncovering Political Hate Speech During Indian Election Campaign: A New
Low-Resource Dataset and Baselines [3.3228144010758593]
IEHate dataset contains 11,457 manually annotated Hindi tweets related to the Indian Assembly Election Campaign from November 1, 2021, to March 9, 2022.
We benchmark the dataset using a range of machine learning, deep learning, and transformer-based algorithms.
In particular, the relatively higher score of human evaluation over algorithms emphasizes the importance of utilizing both human and automated approaches for effective hate speech moderation.
arXiv Detail & Related papers (2023-06-26T15:17:54Z) - Hate Speech and Offensive Language Detection using an Emotion-aware
Shared Encoder [1.8734449181723825]
Existing works on hate speech and offensive language detection produce promising results based on pre-trained transformer models.
This paper addresses a multi-task joint learning approach which combines external emotional features extracted from another corpora.
Our findings demonstrate that emotional knowledge helps to more reliably identify hate speech and offensive language across datasets.
arXiv Detail & Related papers (2023-02-17T09:31:06Z) - Panning for gold: Lessons learned from the platform-agnostic automated
detection of political content in textual data [48.7576911714538]
We discuss how these techniques can be used to detect political content across different platforms.
We compare the performance of three groups of detection techniques relying on dictionaries, supervised machine learning, or neural networks.
Our results show the limited impact of preprocessing on model performance, with the best results for less noisy data being achieved by neural network- and machine-learning-based models.
arXiv Detail & Related papers (2022-07-01T15:23:23Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Automatic Evaluation and Moderation of Open-domain Dialogue Systems [59.305712262126264]
A long standing challenge that bothers the researchers is the lack of effective automatic evaluation metrics.
This paper describes the data, baselines and results obtained for the Track 5 at the Dialogue System Technology Challenge 10 (DSTC10)
arXiv Detail & Related papers (2021-11-03T10:08:05Z) - When Does Translation Require Context? A Data-driven, Multilingual
Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT)
Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation.
We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.