Exploring multi-task multi-lingual learning of transformer models for
hate speech and offensive speech identification in social media
- URL: http://arxiv.org/abs/2101.11155v1
- Date: Wed, 27 Jan 2021 01:25:22 GMT
- Title: Exploring multi-task multi-lingual learning of transformer models for
hate speech and offensive speech identification in social media
- Authors: Sudhanshu Mishra, Shivangi Prasad, Shubhanshu Mishra
- Abstract summary: We use a multi-task and multi-lingual approach to solve three sub-tasks for hate speech.
These sub-tasks were part of the 2019 shared task on hate speech and offensive content (HASOC) identification in Indo-European languages.
We show that it is possible to to utilize different combined approaches to obtain models that can generalize easily on different languages and tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Hate Speech has become a major content moderation issue for online social
media platforms. Given the volume and velocity of online content production, it
is impossible to manually moderate hate speech related content on any platform.
In this paper we utilize a multi-task and multi-lingual approach based on
recently proposed Transformer Neural Networks to solve three sub-tasks for hate
speech. These sub-tasks were part of the 2019 shared task on hate speech and
offensive content (HASOC) identification in Indo-European languages. We expand
on our submission to that competition by utilizing multi-task models which are
trained using three approaches, a) multi-task learning with separate task
heads, b) back-translation, and c) multi-lingual training. Finally, we
investigate the performance of various models and identify instances where the
Transformer based models perform differently and better. We show that it is
possible to to utilize different combined approaches to obtain models that can
generalize easily on different languages and tasks, while trading off slight
accuracy (in some cases) for a much reduced inference time compute cost. We
open source an updated version of our HASOC 2019 code with the new improvements
at https://github.com/socialmediaie/MTML_HateSpeech.
Related papers
- SpeechCaps: Advancing Instruction-Based Universal Speech Models with Multi-Talker Speaking Style Captioning [43.71388370559826]
This paper introduces a multi-talker speaking style captioning task to enhance the understanding of speaker and prosodic information.
We used large language models to generate descriptions for multi-talker speech.
We trained our model with pre-training on this captioning task followed by instruction tuning.
arXiv Detail & Related papers (2024-08-25T17:05:26Z) - MulliVC: Multi-lingual Voice Conversion With Cycle Consistency [75.59590240034261]
MulliVC is a novel voice conversion system that only converts timbre and keeps original content and source language prosody without multi-lingual paired data.
Both objective and subjective results indicate that MulliVC significantly surpasses other methods in both monolingual and cross-lingual contexts.
arXiv Detail & Related papers (2024-08-08T18:12:51Z) - PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models [19.719401865551745]
We present a multitask speech model -- PolySpeech, which supports speech recognition, speech synthesis, and two speech classification tasks.
PolySpeech shows competitiveness across various tasks compared to single-task models.
arXiv Detail & Related papers (2024-06-12T01:35:46Z) - SpeechX: Neural Codec Language Model as a Versatile Speech Transformer [57.82364057872905]
SpeechX is a versatile speech generation model capable of zero-shot TTS and various speech transformation tasks.
Experimental results show SpeechX's efficacy in various tasks, including zero-shot TTS, noise suppression, target speaker extraction, speech removal, and speech editing with or without background noise.
arXiv Detail & Related papers (2023-08-14T01:01:19Z) - Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation [65.13824257448564]
This paper proposes a textless training method for many-to-many multilingual speech-to-speech translation.
By treating the speech units as pseudo-text, we can focus on the linguistic content of the speech.
We demonstrate that the proposed UTUT model can be effectively utilized not only for Speech-to-Speech Translation (S2ST) but also for multilingual Text-to-Speech Synthesis (T2S) and Text-to-Speech Translation (T2ST)
arXiv Detail & Related papers (2023-08-03T15:47:04Z) - VioLA: Unified Codec Language Models for Speech Recognition, Synthesis,
and Translation [91.39949385661379]
VioLA is a single auto-regressive Transformer decoder-only network that unifies various cross-modal tasks involving speech and text.
We first convert all the speech utterances to discrete tokens using an offline neural encoder.
We further integrate task IDs (TID) and language IDs (LID) into the proposed model to enhance the modeling capability of handling different languages and tasks.
arXiv Detail & Related papers (2023-05-25T14:39:47Z) - Hate Speech and Offensive Language Detection using an Emotion-aware
Shared Encoder [1.8734449181723825]
Existing works on hate speech and offensive language detection produce promising results based on pre-trained transformer models.
This paper addresses a multi-task joint learning approach which combines external emotional features extracted from another corpora.
Our findings demonstrate that emotional knowledge helps to more reliably identify hate speech and offensive language across datasets.
arXiv Detail & Related papers (2023-02-17T09:31:06Z) - ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual
Multi-Speaker Text-to-Speech [58.93395189153713]
We extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks.
We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes.
Our model shows great improvements over speaker-embedding-based multi-speaker TTS methods.
arXiv Detail & Related papers (2022-11-07T13:35:16Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Cross-lingual hate speech detection based on multilingual
domain-specific word embeddings [4.769747792846004]
We propose to address the problem of multilingual hate speech detection from the perspective of transfer learning.
Our goal is to determine if knowledge from one particular language can be used to classify other language.
We show that the use of our simple yet specific multilingual hate representations improves classification results.
arXiv Detail & Related papers (2021-04-30T02:24:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.