Attentive Fusion: A Transformer-based Approach to Multimodal Hate Speech
Detection
- URL: http://arxiv.org/abs/2401.10653v1
- Date: Fri, 19 Jan 2024 11:59:13 GMT
- Title: Attentive Fusion: A Transformer-based Approach to Multimodal Hate Speech
Detection
- Authors: Atanu Mandal, Gargi Roy, Amit Barman, Indranil Dutta, Sudip Kumar
Naskar
- Abstract summary: We present an approach to identify whether a speech promotes hate or not utilizing both audio and textual representations.
Our methodology is based on the Transformer framework that incorporates both audio and text sampling, accompanied by our very own layer called "Attentive Fusion"
- Score: 0.2866102183256175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the recent surge and exponential growth of social media usage,
scrutinizing social media content for the presence of any hateful content is of
utmost importance. Researchers have been diligently working since the past
decade on distinguishing between content that promotes hatred and content that
does not. Traditionally, the main focus has been on analyzing textual content.
However, recent research attempts have also commenced into the identification
of audio-based content. Nevertheless, studies have shown that relying solely on
audio or text-based content may be ineffective, as recent upsurge indicates
that individuals often employ sarcasm in their speech and writing. To overcome
these challenges, we present an approach to identify whether a speech promotes
hate or not utilizing both audio and textual representations. Our methodology
is based on the Transformer framework that incorporates both audio and text
sampling, accompanied by our very own layer called "Attentive Fusion". The
results of our study surpassed previous state-of-the-art techniques, achieving
an impressive macro F1 score of 0.927 on the Test Set.
Related papers
- Double Mixture: Towards Continual Event Detection from Speech [60.33088725100812]
Speech event detection is crucial for multimedia retrieval, involving the tagging of both semantic and acoustic events.
This paper tackles two primary challenges in speech event detection: the continual integration of new events without forgetting previous ones, and the disentanglement of semantic from acoustic events.
We propose a novel method, 'Double Mixture,' which merges speech expertise with robust memory mechanisms to enhance adaptability and prevent forgetting.
arXiv Detail & Related papers (2024-04-20T06:32:00Z) - Lexical Squad@Multimodal Hate Speech Event Detection 2023: Multimodal
Hate Speech Detection using Fused Ensemble Approach [0.23020018305241333]
We present our novel ensemble learning approach for detecting hate speech, by classifying text-embedded images into two labels, namely "Hate Speech" and "No Hate Speech"
Our proposed ensemble model yielded promising results with 75.21 and 74.96 as accuracy and F-1 score (respectively)
arXiv Detail & Related papers (2023-09-23T12:06:05Z) - CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a
Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations.
We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Assessing the impact of contextual information in hate speech detection [0.48369513656026514]
We provide a novel corpus for contextualized hate speech detection based on user responses to news posts from media outlets on Twitter.
This corpus was collected in the Rioplatense dialectal variety of Spanish and focuses on hate speech associated with the COVID-19 pandemic.
arXiv Detail & Related papers (2022-10-02T09:04:47Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Multilingual and Multimodal Abuse Detection [3.4352862428120123]
This paper attempts abuse detection in conversational audio from a multimodal perspective in a multilingual social media setting.
Our proposed method, MADA, explicitly focuses on two modalities other than the audio itself.
We test the proposed approach on 10 different languages and observe consistent gains in the range 0.6%-5.2% by leveraging multiple modalities.
arXiv Detail & Related papers (2022-04-03T13:28:58Z) - Emotion Based Hate Speech Detection using Multimodal Learning [0.0]
This paper proposes the first multimodal deep learning framework to combine the auditory features representing emotion and the semantic features to detect hateful content.
Our results demonstrate that incorporating emotional attributes leads to significant improvement over text-based models in detecting hateful multimedia content.
arXiv Detail & Related papers (2022-02-13T05:39:47Z) - DeepHate: Hate Speech Detection via Multi-Faceted Text Representations [8.192671048046687]
DeepHate is a novel deep learning model that combines multi-faceted text representations such as word embeddings, sentiments, and topical information.
We conduct extensive experiments and evaluate DeepHate on three large publicly available real-world datasets.
arXiv Detail & Related papers (2021-03-14T16:11:30Z) - "Notic My Speech" -- Blending Speech Patterns With Multimedia [65.91370924641862]
We propose a view-temporal attention mechanism to model both the view dependence and the visemic importance in speech recognition and understanding.
Our proposed method outperformed the existing work by 4.99% in terms of the viseme error rate.
We show that there is a strong correlation between our model's understanding of multi-view speech and the human perception.
arXiv Detail & Related papers (2020-06-12T06:51:55Z) - Deep Audio-Visual Learning: A Survey [53.487938108404244]
We divide the current audio-visual learning tasks into four different subfields.
We discuss state-of-the-art methods as well as the remaining challenges of each subfield.
We summarize the commonly used datasets and performance metrics.
arXiv Detail & Related papers (2020-01-14T13:11:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.