AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify
- URL: http://arxiv.org/abs/2207.08557v1
- Date: Mon, 18 Jul 2022 12:33:51 GMT
- Title: AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify
- Authors: Ahmad Shapiro, Ayman Khalafallah, Marwan Torki
- Abstract summary: We present our submission to the Arabic Hate Speech 2022 Shared Task Workshop (OSACT5 2022) using the associated Arabic Twitter dataset.
For offensive Tweets, sub-task B focuses on detecting whether the tweet is hate speech or not.
For hate speech Tweets, sub-task C focuses on detecting the fine-grained type of hate speech among six different classes.
- Score: 2.9220076568786326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online presence on social media platforms such as Facebook and Twitter has
become a daily habit for internet users. Despite the vast amount of services
the platforms offer for their users, users suffer from cyber-bullying, which
further leads to mental abuse and may escalate to cause physical harm to
individuals or targeted groups. In this paper, we present our submission to the
Arabic Hate Speech 2022 Shared Task Workshop (OSACT5 2022) using the associated
Arabic Twitter dataset. The shared task consists of 3 sub-tasks, sub-task A
focuses on detecting whether the tweet is offensive or not. Then, For offensive
Tweets, sub-task B focuses on detecting whether the tweet is hate speech or
not. Finally, For hate speech Tweets, sub-task C focuses on detecting the
fine-grained type of hate speech among six different classes. Transformer
models proved their efficiency in classification tasks, but with the problem of
over-fitting when fine-tuned on a small or an imbalanced dataset. We overcome
this limitation by investigating multiple training paradigms such as
Contrastive learning and Multi-task learning along with Classification
fine-tuning and an ensemble of our top 5 performers. Our proposed solution
achieved 0.841, 0.817, and 0.476 macro F1-average in sub-tasks A, B, and C
respectively.
Related papers
- ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a
Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations.
We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z) - Overview of Abusive and Threatening Language Detection in Urdu at FIRE
2021 [50.591267188664666]
We present two shared tasks of abusive and threatening language detection for the Urdu language.
We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening.
For both subtasks, m-Bert based transformer model showed the best performance.
arXiv Detail & Related papers (2022-07-14T07:38:13Z) - Meta AI at Arabic Hate Speech 2022: MultiTask Learning with
Self-Correction for Hate Speech Classification [20.632017481940075]
We tackle the Arabic Fine-Grained Hate Speech Detection shared task.
The tasks are to predict if a tweet contains (1) Offensive language; and whether it is considered (2) Hate Speech or not and if so, then predict the (3) Fine-Grained Hate Speech label from one of six categories.
Our final solution is an ensemble of models that employs multitask learning and a self-consistency correction method yielding 82.7% on the hate speech subtask.
arXiv Detail & Related papers (2022-05-16T19:53:16Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Leveraging Transformers for Hate Speech Detection in Conversational
Code-Mixed Tweets [36.29939722039909]
This paper describes the system proposed by team MIDAS-IIITD for HASOC 2021 subtask 2.
It is one of the first shared tasks focusing on detecting hate speech from Hindi-English code-mixed conversations on Twitter.
Our best performing system, a hard voting ensemble of Indic-BERT, XLM-RoBERTa, and Multilingual BERT, achieved a macro F1 score of 0.7253.
arXiv Detail & Related papers (2021-12-18T19:27:33Z) - Overview of the HASOC track at FIRE 2020: Hate Speech and Offensive
Content Identification in Indo-European Languages [2.927129789938848]
The HASOC track intends to develop and optimize Hate Speech detection algorithms for Hindi, German and English.
The dataset is collected from a Twitter archive and pre-classified by a machine learning system.
Overall, 252 runs were submitted by 40 teams. The performance of the best classification algorithms for task A are F1 measures of 0.51, 0.53 and 0.52 for English, Hindi, and German, respectively.
arXiv Detail & Related papers (2021-08-12T19:02:53Z) - Leveraging Multilingual Transformers for Hate Speech Detection [11.306581296760864]
We leverage state of the art Transformer language models to identify hate speech in a multilingual setting.
With a pre-trained multilingual Transformer-based text encoder at the base, we are able to successfully identify and classify hate speech from multiple languages.
arXiv Detail & Related papers (2021-01-08T20:23:50Z) - Countering hate on social media: Large scale classification of hate and
counter speech [0.0]
Hateful rhetoric is plaguing online discourse, fostering extreme societal movements and possibly giving rise to real-world violence.
A potential solution is citizen-generated counter speech where citizens actively engage in hate-filled conversations to attempt to restore civil non-polarized discourse.
Here we made use of a unique situation in Germany where self-labeling groups engaged in organized online hate and counter speech.
We used an ensemble learning algorithm which pairs a variety of paragraph embeddings with regularized logistic regression functions to classify both hate and counter speech in a corpus of millions of relevant tweets from these two groups.
arXiv Detail & Related papers (2020-06-02T23:12:52Z) - Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media
during the COVID-19 Crisis [51.39895377836919]
COVID-19 has sparked racism and hate on social media targeted towards Asian communities.
We study the evolution and spread of anti-Asian hate speech through the lens of Twitter.
We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months.
arXiv Detail & Related papers (2020-05-25T21:58:09Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.