Related papers: Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models

Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models

URL: http://arxiv.org/abs/2210.13029v2
Date: Tue, 25 Oct 2022 08:20:35 GMT
Title: Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models
Authors: Syrielle Montariol, Arij Riabi, Djam\'e Seddah
Abstract summary: We show how hate speech detection models benefit from a cross-lingual knowledge proxy brought by auxiliary tasks fine-tuning. We propose to train on multilingual auxiliary tasks to improve zero-shot transfer of hate speech detection models across languages.
Score: 3.97478982737167
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Zero-shot cross-lingual transfer learning has been shown to be highly challenging for tasks involving a lot of linguistic specificities or when a cultural gap is present between languages, such as in hate speech detection. In this paper, we highlight this limitation for hate speech detection in several domains and languages using strict experimental settings. Then, we propose to train on multilingual auxiliary tasks -- sentiment analysis, named entity recognition, and tasks relying on syntactic information -- to improve zero-shot transfer of hate speech detection models across languages. We show how hate speech detection models benefit from a cross-lingual knowledge proxy brought by auxiliary tasks fine-tuning and highlight these tasks' positive impact on bridging the hate speech linguistic and cultural gap between languages.

Related papers

LASPA: Language Agnostic Speaker Disentanglement with Prefix-Tuned Cross-Attention [2.199918533021483]
The overlap between vocal traits such as accent, vocal anatomy, and a language's phonetic structure complicates separating linguistic and speaker information.<n>Disentangling these components can significantly improve speaker recognition accuracy.<n>We propose a novel disentanglement learning strategy that integrates joint learning through prefix-tuned cross-attention.
arXiv Detail & Related papers (2025-06-02T10:59:31Z)
Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study [59.30098850050971]
This work evaluates LLM prompting-based detection across eight non-English languages.<n>We show that while zero-shot and few-shot prompting lag behind fine-tuned encoder models on most of the real-world evaluation sets, they achieve better generalization on functional tests for hate speech detection.
arXiv Detail & Related papers (2025-05-09T16:00:01Z)
Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment [50.80949663719335]
multilingual sentence encoders (MSEs) are commonly obtained by training multilingual language models to map sentences from different languages into a shared semantic space.<n>MSEs are subject to curse of multilinguality, a loss of monolingual representational accuracy due to parameter sharing.<n>We train the cross-lingual adapters with two different types of data to resolve the conflicting requirements of different cross-lingual tasks.
arXiv Detail & Related papers (2024-07-20T13:56:39Z)
No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement [59.37775534633868]
We introduce a novel method called language arithmetic, which enables training-free post-processing. The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes.
arXiv Detail & Related papers (2024-04-24T08:52:40Z)
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages [35.185808055004344]
Most hate speech datasets so far focus on English-language content. More data is needed, but annotating hateful content is expensive, time-consuming and potentially harmful to annotators. We explore data-efficient strategies for expanding hate speech detection into under-resourced languages.
arXiv Detail & Related papers (2022-10-20T15:49:00Z)
Code-Switching without Switching: Language Agnostic End-to-End Speech Translation [68.8204255655161]
We treat speech recognition and translation as one unified end-to-end speech translation problem. By training LAST with both input languages, we decode speech into one target language, regardless of the input language.
arXiv Detail & Related papers (2022-10-04T10:34:25Z)
Highly Generalizable Models for Multilingual Hate Speech Detection [0.0]
Hate speech detection has become an important research topic within the past decade. We compile a dataset of 11 languages and resolve different by analyzing the combined data with binary labels: hate speech or not hate speech. We conduct three types of experiments for a binary hate speech classification task: Multilingual-Train Monolingual-Test, MonolingualTrain Monolingual-Test and Language-Family-Train Monolingual Test scenarios.
arXiv Detail & Related papers (2022-01-27T03:09:38Z)
Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages. We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language. We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z)
Cross-lingual Capsule Network for Hate Speech Detection in Social Media [6.531659195805749]
We investigate the cross-lingual hate speech detection task, tackling the problem by adapting the hate speech resources from one language to another. We propose a cross-lingual capsule network learning model coupled with extra domain-specific lexical semantics for hate speech. Our model achieves state-of-the-art performance on benchmark datasets from AMI@Evalita 2018 and AMI@Ibereval 2018.
arXiv Detail & Related papers (2021-08-06T12:53:41Z)
Cross-lingual hate speech detection based on multilingual domain-specific word embeddings [4.769747792846004]
We propose to address the problem of multilingual hate speech detection from the perspective of transfer learning. Our goal is to determine if knowledge from one particular language can be used to classify other language. We show that the use of our simple yet specific multilingual hate representations improves classification results.
arXiv Detail & Related papers (2021-04-30T02:24:50Z)
AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context. It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts. Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)
Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source. We observe that our representations embed typology and strengthen correlations with language relationships. We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
On the Importance of Word Order Information in Cross-lingual Sequence Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages. We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.