Cross-lingual Capsule Network for Hate Speech Detection in Social Media
- URL: http://arxiv.org/abs/2108.03089v1
- Date: Fri, 6 Aug 2021 12:53:41 GMT
- Title: Cross-lingual Capsule Network for Hate Speech Detection in Social Media
- Authors: Aiqi Jiang, Arkaitz Zubiaga
- Abstract summary: We investigate the cross-lingual hate speech detection task, tackling the problem by adapting the hate speech resources from one language to another.
We propose a cross-lingual capsule network learning model coupled with extra domain-specific lexical semantics for hate speech.
Our model achieves state-of-the-art performance on benchmark datasets from AMI@Evalita 2018 and AMI@Ibereval 2018.
- Score: 6.531659195805749
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most hate speech detection research focuses on a single language, generally
English, which limits their generalisability to other languages. In this paper
we investigate the cross-lingual hate speech detection task, tackling the
problem by adapting the hate speech resources from one language to another. We
propose a cross-lingual capsule network learning model coupled with extra
domain-specific lexical semantics for hate speech (CCNL-Ex). Our model achieves
state-of-the-art performance on benchmark datasets from AMI@Evalita2018 and
AMI@Ibereval2018 involving three languages: English, Spanish and Italian,
outperforming state-of-the-art baselines on all six language pairs.
Related papers
- Wav2Gloss: Generating Interlinear Glossed Text from Speech [78.64412090339044]
We propose Wav2Gloss, a task in which four linguistic annotation components are extracted automatically from speech.
We provide various baselines to lay the groundwork for future research on Interlinear Glossed Text generation from speech.
arXiv Detail & Related papers (2024-03-19T21:45:29Z) - LAHM : Large Annotated Dataset for Multi-Domain and Multilingual Hate
Speech Identification [2.048680519934008]
We present a new multilingual hate speech analysis dataset for English, Hindi, Arabic, French, German and Spanish languages.
This paper is the first to address the problem of identifying various types of hate speech in these five wide domains in these six languages.
arXiv Detail & Related papers (2023-04-03T12:03:45Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec
Language Modeling [92.55131711064935]
We propose a cross-lingual neural language model, VALL-E X, for cross-lingual speech synthesis.
VALL-E X inherits strong in-context learning capabilities and can be applied for zero-shot cross-lingual text-to-speech synthesis and zero-shot speech-to-speech translation tasks.
It can generate high-quality speech in the target language via just one speech utterance in the source language as a prompt while preserving the unseen speaker's voice, emotion, and acoustic environment.
arXiv Detail & Related papers (2023-03-07T14:31:55Z) - M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for
Multilingual Speech to Image Retrieval [56.49878599920353]
This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval.
For non-English image-speech retrieval, we outperform the current state-of-the-art performance by a wide margin both when training separate models for each language, and with a single model which processes speech in all three languages.
arXiv Detail & Related papers (2022-11-02T14:54:45Z) - Data-Efficient Strategies for Expanding Hate Speech Detection into
Under-Resourced Languages [35.185808055004344]
Most hate speech datasets so far focus on English-language content.
More data is needed, but annotating hateful content is expensive, time-consuming and potentially harmful to annotators.
We explore data-efficient strategies for expanding hate speech detection into under-resourced languages.
arXiv Detail & Related papers (2022-10-20T15:49:00Z) - Highly Generalizable Models for Multilingual Hate Speech Detection [0.0]
Hate speech detection has become an important research topic within the past decade.
We compile a dataset of 11 languages and resolve different by analyzing the combined data with binary labels: hate speech or not hate speech.
We conduct three types of experiments for a binary hate speech classification task: Multilingual-Train Monolingual-Test, MonolingualTrain Monolingual-Test and Language-Family-Train Monolingual Test scenarios.
arXiv Detail & Related papers (2022-01-27T03:09:38Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Cross-lingual hate speech detection based on multilingual
domain-specific word embeddings [4.769747792846004]
We propose to address the problem of multilingual hate speech detection from the perspective of transfer learning.
Our goal is to determine if knowledge from one particular language can be used to classify other language.
We show that the use of our simple yet specific multilingual hate representations improves classification results.
arXiv Detail & Related papers (2021-04-30T02:24:50Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.