HateMM: A Multi-Modal Dataset for Hate Video Classification
- URL: http://arxiv.org/abs/2305.03915v1
- Date: Sat, 6 May 2023 03:39:00 GMT
- Title: HateMM: A Multi-Modal Dataset for Hate Video Classification
- Authors: Mithun Das, Rohit Raj, Punyajoy Saha, Binny Mathew, Manish Gupta,
Animesh Mukherjee
- Abstract summary: We build deep learning multi-modal models to classify the hate videos and observe that using all the modalities improves the overall hate speech detection performance.
Our work takes the first step toward understanding and modeling hateful videos on video hosting platforms such as BitChute.
- Score: 8.758311170297192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hate speech has become one of the most significant issues in modern society,
having implications in both the online and the offline world. Due to this, hate
speech research has recently gained a lot of traction. However, most of the
work has primarily focused on text media with relatively little work on images
and even lesser on videos. Thus, early stage automated video moderation
techniques are needed to handle the videos that are being uploaded to keep the
platform safe and healthy. With a view to detect and remove hateful content
from the video sharing platforms, our work focuses on hate video detection
using multi-modalities. To this end, we curate ~43 hours of videos from
BitChute and manually annotate them as hate or non-hate, along with the frame
spans which could explain the labelling decision. To collect the relevant
videos we harnessed search keywords from hate lexicons. We observe various cues
in images and audio of hateful videos. Further, we build deep learning
multi-modal models to classify the hate videos and observe that using all the
modalities of the videos improves the overall hate speech detection performance
(accuracy=0.798, macro F1-score=0.790) by ~5.7% compared to the best uni-modal
model in terms of macro F1 score. In summary, our work takes the first step
toward understanding and modeling hateful videos on video hosting platforms
such as BitChute.
Related papers
- VideoQA in the Era of LLMs: An Empirical Study [108.37456450182054]
Video Large Language Models (Video-LLMs) are flourishing and has advanced many video-intuitive tasks.
This work conducts a timely and comprehensive study of Video-LLMs' behavior in VideoQA.
Our analyses demonstrate that Video-LLMs excel in VideoQA; they can correlate contextual cues and generate plausible responses to questions about varied video contents.
However, models falter in handling video temporality, both in reasoning about temporal content ordering and grounding QA-relevant temporal moments.
arXiv Detail & Related papers (2024-08-08T05:14:07Z) - MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili [11.049937698021054]
This study presents MultiHateClip, a novel multilingual dataset created through hate lexicons and human annotation.
It aims to enhance the detection of hateful videos on platforms such as YouTube and Bilibili, including content in both English and Chinese languages.
arXiv Detail & Related papers (2024-07-28T08:19:09Z) - Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model [62.38322742493649]
We build a video VQA benchmark covering editing categories, i.e., effect, funny, meme, and game.
Most of the open-source video LMMs perform poorly on the benchmark, indicating a huge domain gap between edited short videos on social media and regular raw videos.
To improve the generalization ability of LMMs, we collect a training set for the proposed benchmark based on both Panda-70M/WebVid raw videos and small-scale TikTok/CapCut edited videos.
arXiv Detail & Related papers (2024-06-15T03:28:52Z) - InternVideo2: Scaling Foundation Models for Multimodal Video Understanding [51.129913789991924]
InternVideo2 is a new family of video foundation models (FM) that achieve state-of-the-art results in video recognition, video-speech tasks, and video-centric tasks.
Our core design is a progressive training approach that unifies the masked video modeling, cross contrastive learning, and prediction token, scaling up to 6B video size.
arXiv Detail & Related papers (2024-03-22T17:57:42Z) - Identifying False Content and Hate Speech in Sinhala YouTube Videos by
Analyzing the Audio [0.0]
This study proposes a solution to minimize the spread of violence and misinformation in Sinhala YouTube videos.
The approach involves developing a rating system that assesses whether a video contains false information by comparing the title and description with the audio content.
arXiv Detail & Related papers (2024-01-30T08:08:34Z) - Lexical Squad@Multimodal Hate Speech Event Detection 2023: Multimodal
Hate Speech Detection using Fused Ensemble Approach [0.23020018305241333]
We present our novel ensemble learning approach for detecting hate speech, by classifying text-embedded images into two labels, namely "Hate Speech" and "No Hate Speech"
Our proposed ensemble model yielded promising results with 75.21 and 74.96 as accuracy and F-1 score (respectively)
arXiv Detail & Related papers (2023-09-23T12:06:05Z) - Multi-modal Hate Speech Detection using Machine Learning [0.6793286055326242]
A combined approach of multimodal system has been proposed to detect hate speech from video contents by extracting feature images, feature values extracted from the audio, text and used machine learning and Natural language processing.
arXiv Detail & Related papers (2023-06-15T06:46:52Z) - InternVideo: General Video Foundation Models via Generative and
Discriminative Learning [52.69422763715118]
We present general video foundation models, InternVideo, for dynamic and complex video-level understanding tasks.
InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives.
InternVideo achieves state-of-the-art performance on 39 video datasets from extensive tasks including video action recognition/detection, video-language alignment, and open-world video applications.
arXiv Detail & Related papers (2022-12-06T18:09:49Z) - Emotion Based Hate Speech Detection using Multimodal Learning [0.0]
This paper proposes the first multimodal deep learning framework to combine the auditory features representing emotion and the semantic features to detect hateful content.
Our results demonstrate that incorporating emotional attributes leads to significant improvement over text-based models in detecting hateful multimedia content.
arXiv Detail & Related papers (2022-02-13T05:39:47Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Video Understanding as Machine Translation [53.59298393079866]
We tackle a wide variety of downstream video understanding tasks by means of a single unified framework.
We report performance gains over the state-of-the-art on several downstream tasks including video classification (EPIC-Kitchens), question answering (TVQA), captioning (TVC, YouCook2, and MSR-VTT)
arXiv Detail & Related papers (2020-06-12T14:07:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.