Overview of the HASOC track at FIRE 2020: Hate Speech and Offensive
Content Identification in Indo-European Languages
- URL: http://arxiv.org/abs/2108.05927v1
- Date: Thu, 12 Aug 2021 19:02:53 GMT
- Title: Overview of the HASOC track at FIRE 2020: Hate Speech and Offensive
Content Identification in Indo-European Languages
- Authors: Thomas Mandla, Sandip Modha, Gautam Kishore Shahi, Amit Kumar Jaiswal,
Durgesh Nandini, Daksh Patel, Prasenjit Majumder and Johannes Sch\"afer
- Abstract summary: The HASOC track intends to develop and optimize Hate Speech detection algorithms for Hindi, German and English.
The dataset is collected from a Twitter archive and pre-classified by a machine learning system.
Overall, 252 runs were submitted by 40 teams. The performance of the best classification algorithms for task A are F1 measures of 0.51, 0.53 and 0.52 for English, Hindi, and German, respectively.
- Score: 2.927129789938848
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the growth of social media, the spread of hate speech is also increasing
rapidly. Social media are widely used in many countries. Also Hate Speech is
spreading in these countries. This brings a need for multilingual Hate Speech
detection algorithms. Much research in this area is dedicated to English at the
moment. The HASOC track intends to provide a platform to develop and optimize
Hate Speech detection algorithms for Hindi, German and English. The dataset is
collected from a Twitter archive and pre-classified by a machine learning
system. HASOC has two sub-task for all three languages: task A is a binary
classification problem (Hate and Not Offensive) while task B is a fine-grained
classification problem for three classes (HATE) Hate speech, OFFENSIVE and
PROFANITY. Overall, 252 runs were submitted by 40 teams. The performance of the
best classification algorithms for task A are F1 measures of 0.51, 0.53 and
0.52 for English, Hindi, and German, respectively. For task B, the best
classification algorithms achieved F1 measures of 0.26, 0.33 and 0.29 for
English, Hindi, and German, respectively. This article presents the tasks and
the data development as well as the results. The best performing algorithms
were mainly variants of the transformer architecture BERT. However, also other
systems were applied with good success
Related papers
- SeamlessM4T: Massively Multilingual & Multimodal Machine Translation [90.71078166159295]
We introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-text translation, and automatic speech recognition for up to 100 languages.
We developed the first multilingual system capable of translating from and into English for both speech and text.
On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation.
arXiv Detail & Related papers (2023-08-22T17:44:18Z) - ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text
Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models.
Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z) - AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify [2.9220076568786326]
We present our submission to the Arabic Hate Speech 2022 Shared Task Workshop (OSACT5 2022) using the associated Arabic Twitter dataset.
For offensive Tweets, sub-task B focuses on detecting whether the tweet is hate speech or not.
For hate speech Tweets, sub-task C focuses on detecting the fine-grained type of hate speech among six different classes.
arXiv Detail & Related papers (2022-07-18T12:33:51Z) - Overview of Abusive and Threatening Language Detection in Urdu at FIRE
2021 [50.591267188664666]
We present two shared tasks of abusive and threatening language detection for the Urdu language.
We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening.
For both subtasks, m-Bert based transformer model showed the best performance.
arXiv Detail & Related papers (2022-07-14T07:38:13Z) - Meta AI at Arabic Hate Speech 2022: MultiTask Learning with
Self-Correction for Hate Speech Classification [20.632017481940075]
We tackle the Arabic Fine-Grained Hate Speech Detection shared task.
The tasks are to predict if a tweet contains (1) Offensive language; and whether it is considered (2) Hate Speech or not and if so, then predict the (3) Fine-Grained Hate Speech label from one of six categories.
Our final solution is an ensemble of models that employs multitask learning and a self-consistency correction method yielding 82.7% on the hate speech subtask.
arXiv Detail & Related papers (2022-05-16T19:53:16Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive
Content Identification in English and Indo-Aryan Languages [4.267837363677351]
This paper presents the HASOC subtrack for English, Hindi, and Marathi.
The data set was assembled from Twitter.
The performance of the best classification algorithms for task A are F1 measures 0.91, 0.78 and 0.83 for Marathi, Hindi and English, respectively.
arXiv Detail & Related papers (2021-12-17T03:28:54Z) - One to rule them all: Towards Joint Indic Language Hate Speech Detection [7.296361860015606]
We present a multilingual architecture using state-of-the-art transformer language models to jointly learn hate and offensive speech detection.
On the provided testing corpora, we achieve Macro F1 scores of 0.7996, 0.7748, 0.8651 for sub-task 1A and 0.6268, 0.5603 during the fine-grained classification of sub-task 1B.
arXiv Detail & Related papers (2021-09-28T13:30:00Z) - Leveraging Multilingual Transformers for Hate Speech Detection [11.306581296760864]
We leverage state of the art Transformer language models to identify hate speech in a multilingual setting.
With a pre-trained multilingual Transformer-based text encoder at the base, we are able to successfully identify and classify hate speech from multiple languages.
arXiv Detail & Related papers (2021-01-08T20:23:50Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.