Model-Agnostic Meta-Learning for Multilingual Hate Speech Detection
- URL: http://arxiv.org/abs/2303.02513v1
- Date: Sat, 4 Mar 2023 22:28:29 GMT
- Title: Model-Agnostic Meta-Learning for Multilingual Hate Speech Detection
- Authors: Md Rabiul Awal, Roy Ka-Wei Lee, Eshaan Tanwar, Tanmay Garg, Tanmoy
Chakraborty
- Abstract summary: Hate speech in social media is a growing phenomenon, and detecting such toxic content has gained significant traction.
HateMAML is a model-agnostic meta-learning-based framework that effectively performs hate speech detection in low-resource languages.
Extensive experiments are conducted on five datasets across eight different low-resource languages.
- Score: 23.97444551607624
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hate speech in social media is a growing phenomenon, and detecting such toxic
content has recently gained significant traction in the research community.
Existing studies have explored fine-tuning language models (LMs) to perform
hate speech detection, and these solutions have yielded significant
performance. However, most of these studies are limited to detecting hate
speech only in English, neglecting the bulk of hateful content that is
generated in other languages, particularly in low-resource languages.
Developing a classifier that captures hate speech and nuances in a low-resource
language with limited data is extremely challenging. To fill the research gap,
we propose HateMAML, a model-agnostic meta-learning-based framework that
effectively performs hate speech detection in low-resource languages. HateMAML
utilizes a self-supervision strategy to overcome the limitation of data
scarcity and produces better LM initialization for fast adaptation to an unseen
target language (i.e., cross-lingual transfer) or other hate speech datasets
(i.e., domain generalization). Extensive experiments are conducted on five
datasets across eight different low-resource languages. The results show that
HateMAML outperforms the state-of-the-art baselines by more than 3% in the
cross-domain multilingual transfer setting. We also conduct ablation studies to
analyze the characteristics of HateMAML.
Related papers
- Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts.
We find that Llama Instruct and Mistral models exhibit high degrees of language confusion.
We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Data-Efficient Strategies for Expanding Hate Speech Detection into
Under-Resourced Languages [35.185808055004344]
Most hate speech datasets so far focus on English-language content.
More data is needed, but annotating hateful content is expensive, time-consuming and potentially harmful to annotators.
We explore data-efficient strategies for expanding hate speech detection into under-resourced languages.
arXiv Detail & Related papers (2022-10-20T15:49:00Z) - Multilingual HateCheck: Functional Tests for Multilingual Hate Speech
Detection Models [14.128029444990895]
We introduce HateCheck (MHC), a suite of functional tests for multilingual hate speech detection models.
MHC covers 34 functionalities across ten languages, which is more languages than any other hate speech dataset.
We train and test a high-performing multilingual hate speech detection model, and reveal critical model weaknesses for monolingual and cross-lingual applications.
arXiv Detail & Related papers (2022-06-20T17:54:39Z) - Highly Generalizable Models for Multilingual Hate Speech Detection [0.0]
Hate speech detection has become an important research topic within the past decade.
We compile a dataset of 11 languages and resolve different by analyzing the combined data with binary labels: hate speech or not hate speech.
We conduct three types of experiments for a binary hate speech classification task: Multilingual-Train Monolingual-Test, MonolingualTrain Monolingual-Test and Language-Family-Train Monolingual Test scenarios.
arXiv Detail & Related papers (2022-01-27T03:09:38Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Cross-lingual Capsule Network for Hate Speech Detection in Social Media [6.531659195805749]
We investigate the cross-lingual hate speech detection task, tackling the problem by adapting the hate speech resources from one language to another.
We propose a cross-lingual capsule network learning model coupled with extra domain-specific lexical semantics for hate speech.
Our model achieves state-of-the-art performance on benchmark datasets from AMI@Evalita 2018 and AMI@Ibereval 2018.
arXiv Detail & Related papers (2021-08-06T12:53:41Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - On Negative Interference in Multilingual Models: Findings and A
Meta-Learning Treatment [59.995385574274785]
We show that, contrary to previous belief, negative interference also impacts low-resource languages.
We present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference.
arXiv Detail & Related papers (2020-10-06T20:48:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.