Leveraging Transformers for Hate Speech Detection in Conversational
Code-Mixed Tweets
- URL: http://arxiv.org/abs/2112.09986v1
- Date: Sat, 18 Dec 2021 19:27:33 GMT
- Title: Leveraging Transformers for Hate Speech Detection in Conversational
Code-Mixed Tweets
- Authors: Zaki Mustafa Farooqi, Sreyan Ghosh and Rajiv Ratn Shah
- Abstract summary: This paper describes the system proposed by team MIDAS-IIITD for HASOC 2021 subtask 2.
It is one of the first shared tasks focusing on detecting hate speech from Hindi-English code-mixed conversations on Twitter.
Our best performing system, a hard voting ensemble of Indic-BERT, XLM-RoBERTa, and Multilingual BERT, achieved a macro F1 score of 0.7253.
- Score: 36.29939722039909
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the current era of the internet, where social media platforms are easily
accessible for everyone, people often have to deal with threats, identity
attacks, hate, and bullying due to their association with a cast, creed,
gender, religion, or even acceptance or rejection of a notion. Existing works
in hate speech detection primarily focus on individual comment classification
as a sequence labeling task and often fail to consider the context of the
conversation. The context of a conversation often plays a substantial role when
determining the author's intent and sentiment behind the tweet. This paper
describes the system proposed by team MIDAS-IIITD for HASOC 2021 subtask 2, one
of the first shared tasks focusing on detecting hate speech from Hindi-English
code-mixed conversations on Twitter. We approach this problem using neural
networks, leveraging the transformer's cross-lingual embeddings and further
finetuning them for low-resource hate-speech classification in transliterated
Hindi text. Our best performing system, a hard voting ensemble of Indic-BERT,
XLM-RoBERTa, and Multilingual BERT, achieved a macro F1 score of 0.7253,
placing us first on the overall leaderboard standings.
Related papers
- Moshi: a speech-text foundation model for real-time dialogue [78.88479749811376]
Current systems for spoken dialogue rely on pipelines independent voice activity detection and text-to-speech.
We show how Moshi Moshi can provide streaming speech recognition and text-to-speech.
Our resulting model is first real-time full spoken large language model modality.
arXiv Detail & Related papers (2024-09-17T17:55:39Z) - Lexical Squad@Multimodal Hate Speech Event Detection 2023: Multimodal
Hate Speech Detection using Fused Ensemble Approach [0.23020018305241333]
We present our novel ensemble learning approach for detecting hate speech, by classifying text-embedded images into two labels, namely "Hate Speech" and "No Hate Speech"
Our proposed ensemble model yielded promising results with 75.21 and 74.96 as accuracy and F-1 score (respectively)
arXiv Detail & Related papers (2023-09-23T12:06:05Z) - CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a
Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations.
We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z) - AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify [2.9220076568786326]
We present our submission to the Arabic Hate Speech 2022 Shared Task Workshop (OSACT5 2022) using the associated Arabic Twitter dataset.
For offensive Tweets, sub-task B focuses on detecting whether the tweet is hate speech or not.
For hate speech Tweets, sub-task C focuses on detecting the fine-grained type of hate speech among six different classes.
arXiv Detail & Related papers (2022-07-18T12:33:51Z) - Improved two-stage hate speech classification for twitter based on Deep
Neural Networks [0.0]
Hate speech is a form of online harassment that involves the use of abusive language.
The model we propose in this work is an extension of an existing approach based on LSTM neural network architectures.
Our study includes a performance comparison of several proposed alternative methods for the second stage evaluated on a public corpus of 16k tweets.
arXiv Detail & Related papers (2022-06-08T20:57:41Z) - BERTuit: Understanding Spanish language in Twitter through a native
transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets.
Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Detection of Hate Speech using BERT and Hate Speech Word Embedding with
Deep Model [0.5801044612920815]
This paper investigates the feasibility of leveraging domain-specific word embedding in Bidirectional LSTM based deep model to automatically detect/classify hate speech.
The experiments showed that domainspecific word embedding with the Bidirectional LSTM based deep model achieved a 93% f1-score while BERT achieved up to 96% f1-score on a combined balanced dataset from available hate speech datasets.
arXiv Detail & Related papers (2021-11-02T11:42:54Z) - One to rule them all: Towards Joint Indic Language Hate Speech Detection [7.296361860015606]
We present a multilingual architecture using state-of-the-art transformer language models to jointly learn hate and offensive speech detection.
On the provided testing corpora, we achieve Macro F1 scores of 0.7996, 0.7748, 0.8651 for sub-task 1A and 0.6268, 0.5603 during the fine-grained classification of sub-task 1B.
arXiv Detail & Related papers (2021-09-28T13:30:00Z) - Streaming Multi-talker Speech Recognition with Joint Speaker
Identification [77.46617674133556]
SURIT employs the recurrent neural network transducer (RNN-T) as the backbone for both speech recognition and speaker identification.
We validate our idea on the Librispeech dataset -- a multi-talker dataset derived from Librispeech, and present encouraging results.
arXiv Detail & Related papers (2021-04-05T18:37:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.