A Feature Extraction based Model for Hate Speech Identification
- URL: http://arxiv.org/abs/2201.04227v1
- Date: Tue, 11 Jan 2022 22:53:28 GMT
- Title: A Feature Extraction based Model for Hate Speech Identification
- Authors: Salar Mohtaj, Vera Schmitt, Sebastian M\"oller
- Abstract summary: This paper presents TU Berlin team experiments and results on the task 1A and 1B of the shared task on hate speech and offensive content identification in Indo-European languages 2021.
The success of different Natural Language Processing models is evaluated for the respective subtasks throughout the competition.
Among the tested models that have been used for the experiments, the transfer learning-based models achieved the best results in both subtasks.
- Score: 2.9005223064604078
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The detection of hate speech online has become an important task, as
offensive language such as hurtful, obscene and insulting content can harm
marginalized people or groups. This paper presents TU Berlin team experiments
and results on the task 1A and 1B of the shared task on hate speech and
offensive content identification in Indo-European languages 2021. The success
of different Natural Language Processing models is evaluated for the respective
subtasks throughout the competition. We tested different models based on
recurrent neural networks in word and character levels and transfer learning
approaches based on Bert on the provided dataset by the competition. Among the
tested models that have been used for the experiments, the transfer
learning-based models achieved the best results in both subtasks.
Related papers
- M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for
Multilingual Speech to Image Retrieval [56.49878599920353]
This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval.
For non-English image-speech retrieval, we outperform the current state-of-the-art performance by a wide margin both when training separate models for each language, and with a single model which processes speech in all three languages.
arXiv Detail & Related papers (2022-11-02T14:54:45Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Offensive Language and Hate Speech Detection with Deep Learning and
Transfer Learning [1.77356577919977]
We propose an approach to automatically classify tweets into three classes: Hate, offensive and Neither.
We create a class module which contains main functionality including text classification, sentiment checking and text data augmentation.
arXiv Detail & Related papers (2021-08-06T20:59:47Z) - AngryBERT: Joint Learning Target and Emotion for Hate Speech Detection [5.649040805759824]
This paper proposes a novel multitask learning-based model, AngryBERT, which jointly learns hate speech detection with sentiment classification and target identification as secondary relevant tasks.
Experiment results show that AngryBERT outperforms state-of-the-art single-task-learning and multitask learning baselines.
arXiv Detail & Related papers (2021-03-14T16:17:26Z) - Transfer Learning Approach for Arabic Offensive Language Detection
System -- BERT-Based Model [0.0]
Cyberhate, online harassment and other misuses of technology are on the rise.
Applying advanced techniques from the Natural Language Processing (NLP) field to support the development of an online hate-free community is a critical task for social justice.
This study aims at investigating the effects of fine-tuning and training Bidirectional Representations from Transformers (BERT) model on multiple Arabic offensive language datasets individually.
arXiv Detail & Related papers (2021-02-09T04:58:18Z) - Exploring multi-task multi-lingual learning of transformer models for
hate speech and offensive speech identification in social media [0.0]
We use a multi-task and multi-lingual approach to solve three sub-tasks for hate speech.
These sub-tasks were part of the 2019 shared task on hate speech and offensive content (HASOC) identification in Indo-European languages.
We show that it is possible to to utilize different combined approaches to obtain models that can generalize easily on different languages and tasks.
arXiv Detail & Related papers (2021-01-27T01:25:22Z) - An Online Multilingual Hate speech Recognition System [13.87667165678441]
We analyse six datasets by combining them into a single homogeneous dataset and classify them into three classes, abusive, hateful or neither.
We create a tool which identifies and scores a page with effective metric in near-real time and uses the same as feedback to re-train our model.
We prove the competitive performance of our multilingual model on two langauges, English and Hindi, leading to comparable or superior performance to most monolingual models.
arXiv Detail & Related papers (2020-11-23T16:33:48Z) - Unsupervised Cross-lingual Representation Learning for Speech
Recognition [63.85924123692923]
XLSR learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations.
Experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining.
arXiv Detail & Related papers (2020-06-24T18:25:05Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.