Identifying False Content and Hate Speech in Sinhala YouTube Videos by
Analyzing the Audio
- URL: http://arxiv.org/abs/2402.01752v1
- Date: Tue, 30 Jan 2024 08:08:34 GMT
- Title: Identifying False Content and Hate Speech in Sinhala YouTube Videos by
Analyzing the Audio
- Authors: W. A. K. M. Wickramaarachchi, Sameeri Sathsara Subasinghe, K. K.
Rashani Tharushika Wijerathna, A. Sahashra Udani Athukorala, Lakmini
Abeywardhana, A. Karunasena
- Abstract summary: This study proposes a solution to minimize the spread of violence and misinformation in Sinhala YouTube videos.
The approach involves developing a rating system that assesses whether a video contains false information by comparing the title and description with the audio content.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: YouTube faces a global crisis with the dissemination of false information and
hate speech. To counter these issues, YouTube has implemented strict rules
against uploading content that includes false information or promotes hate
speech. While numerous studies have been conducted to reduce offensive
English-language content, there's a significant lack of research on Sinhala
content. This study aims to address the aforementioned gap by proposing a
solution to minimize the spread of violence and misinformation in Sinhala
YouTube videos. The approach involves developing a rating system that assesses
whether a video contains false information by comparing the title and
description with the audio content and evaluating whether the video includes
hate speech. The methodology encompasses several steps, including audio
extraction using the Pytube library, audio transcription via the fine-tuned
Whisper model, hate speech detection employing the distilroberta-base model and
a text classification LSTM model, and text summarization through the fine-tuned
BART-Large- XSUM model. Notably, the Whisper model achieved a 48.99\% word
error rate, while the distilroberta-base model demonstrated an F1 score of
0.856 and a recall value of 0.861 in comparison to the LSTM model, which
exhibited signs of overfitting.
Related papers
- Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models [50.89022445197919]
We show that open-source audio LMMs suffer an average attack success rate of 69.14% on harmful audio questions.
Our speech-specific jailbreaks on Gemini-1.5-Pro achieve an attack success rate of 70.67% on the harmful query benchmark.
arXiv Detail & Related papers (2024-10-31T12:11:17Z) - MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili [11.049937698021054]
This study presents MultiHateClip, a novel multilingual dataset created through hate lexicons and human annotation.
It aims to enhance the detection of hateful videos on platforms such as YouTube and Bilibili, including content in both English and Chinese languages.
arXiv Detail & Related papers (2024-07-28T08:19:09Z) - HateTinyLLM : Hate Speech Detection Using Tiny Large Language Models [0.0]
Hate speech encompasses verbal, written, or behavioral communication that targets derogatory or discriminatory language against individuals or groups.
HateTinyLLM is a novel framework based on fine-tuned decoder-only tiny large language models (tinyLLMs) for efficient hate speech detection.
arXiv Detail & Related papers (2024-04-26T05:29:35Z) - Lexical Squad@Multimodal Hate Speech Event Detection 2023: Multimodal
Hate Speech Detection using Fused Ensemble Approach [0.23020018305241333]
We present our novel ensemble learning approach for detecting hate speech, by classifying text-embedded images into two labels, namely "Hate Speech" and "No Hate Speech"
Our proposed ensemble model yielded promising results with 75.21 and 74.96 as accuracy and F-1 score (respectively)
arXiv Detail & Related papers (2023-09-23T12:06:05Z) - Multi-modal Hate Speech Detection using Machine Learning [0.6793286055326242]
A combined approach of multimodal system has been proposed to detect hate speech from video contents by extracting feature images, feature values extracted from the audio, text and used machine learning and Natural language processing.
arXiv Detail & Related papers (2023-06-15T06:46:52Z) - HateMM: A Multi-Modal Dataset for Hate Video Classification [8.758311170297192]
We build deep learning multi-modal models to classify the hate videos and observe that using all the modalities improves the overall hate speech detection performance.
Our work takes the first step toward understanding and modeling hateful videos on video hosting platforms such as BitChute.
arXiv Detail & Related papers (2023-05-06T03:39:00Z) - Models See Hallucinations: Evaluating the Factuality in Video Captioning [57.85548187177109]
We conduct a human evaluation of the factuality in video captioning and collect two annotated factuality datasets.
We find that 57.0% of the model-generated sentences have factual errors, indicating it is a severe problem in this field.
We propose a weakly-supervised, model-based factuality metric FactVC, which outperforms previous metrics on factuality evaluation of video captioning.
arXiv Detail & Related papers (2023-03-06T08:32:50Z) - Video-Guided Curriculum Learning for Spoken Video Grounding [65.49979202728167]
We introduce a new task, spoken video grounding (SVG), which aims to localize the desired video fragments from spoken language descriptions.
To rectify the discriminative phonemes and extract video-related information from noisy audio, we develop a novel video-guided curriculum learning (VGCL)
In addition, we collect the first large-scale spoken video grounding dataset based on ActivityNet.
arXiv Detail & Related papers (2022-09-01T07:47:01Z) - Self-Supervised Learning for speech recognition with Intermediate layer
supervision [52.93758711230248]
We propose Intermediate Layer Supervision for Self-Supervised Learning (ILS-SSL)
ILS-SSL forces the model to concentrate on content information as much as possible by adding an additional SSL loss on the intermediate layers.
Experiments on LibriSpeech test-other set show that our method outperforms HuBERT significantly.
arXiv Detail & Related papers (2021-12-16T10:45:05Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs
for Robust Speech Recognition [52.71604809100364]
We propose wav2vec-Switch, a method to encode noise robustness into contextualized representations of speech.
Specifically, we feed original-noisy speech pairs simultaneously into the wav2vec 2.0 network.
In addition to the existing contrastive learning task, we switch the quantized representations of the original and noisy speech as additional prediction targets.
arXiv Detail & Related papers (2021-10-11T00:08:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.