Television Discourse Decoded: Comprehensive Multimodal Analytics at Scale
- URL: http://arxiv.org/abs/2402.12629v2
- Date: Tue, 6 Aug 2024 07:08:12 GMT
- Title: Television Discourse Decoded: Comprehensive Multimodal Analytics at Scale
- Authors: Anmol Agarwal, Pratyush Priyadarshi, Shiven Sinha, Shrey Gupta, Hitkul Jangra, Ponnurangam Kumaraguru, Kiran Garimella,
- Abstract summary: We tackle the complex task of analyzing televised debates, with a focus on a prime time news debate show from India.
Previous methods, which often relied solely on text, fall short in capturing the multimodal essence of these debates.
We introduce a comprehensive automated toolkit that employs advanced computer vision and speech-to-text techniques for large-scale multimedia analysis.
- Score: 5.965160962617209
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we tackle the complex task of analyzing televised debates, with a focus on a prime time news debate show from India. Previous methods, which often relied solely on text, fall short in capturing the multimodal essence of these debates. To address this gap, we introduce a comprehensive automated toolkit that employs advanced computer vision and speech-to-text techniques for large-scale multimedia analysis. Utilizing state-of-the-art computer vision algorithms and speech-to-text methods, we transcribe, diarize, and analyze thousands of YouTube videos of a prime-time television debate show in India. These debates are a central part of Indian media but have been criticized for compromised journalistic integrity and excessive dramatization. Our toolkit provides concrete metrics to assess bias and incivility, capturing a comprehensive multimedia perspective that includes text, audio utterances, and video frames. Our findings reveal significant biases in topic selection and panelist representation, along with alarming levels of incivility. This work offers a scalable, automated approach for future research in multimedia analysis, with profound implications for the quality of public discourse and democratic debate. To catalyze further research in this area, we also release the code, dataset collected and supplemental pdf.
Related papers
- More than Memes: A Multimodal Topic Modeling Approach to Conspiracy Theories on Telegram [0.0]
We explore the potential of multimodal topic modeling for analyzing conspiracy theories in German-language Telegram channels.
We analyze a corpus of 40, 000 Telegram messages posted in October 2023 in 571 German-language Telegram channels.
arXiv Detail & Related papers (2024-10-11T09:10:26Z) - A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In
Zero Shot [67.00455874279383]
We propose verbalizing long videos to generate descriptions in natural language, then performing video-understanding tasks on the generated story as opposed to the original video.
Our method, despite being zero-shot, achieves significantly better results than supervised baselines for video understanding.
To alleviate a lack of story understanding benchmarks, we publicly release the first dataset on a crucial task in computational social science on persuasion strategy identification.
arXiv Detail & Related papers (2023-05-16T19:13:11Z) - Fighting Malicious Media Data: A Survey on Tampering Detection and
Deepfake Detection [115.83992775004043]
Recent advances in deep learning, particularly deep generative models, open the doors for producing perceptually convincing images and videos at a low cost.
This paper provides a comprehensive review of the current media tampering detection approaches, and discusses the challenges and trends in this field for future research.
arXiv Detail & Related papers (2022-12-12T02:54:08Z) - Inference of Media Bias and Content Quality Using Natural-Language
Processing [6.092956184948962]
We present a framework to infer both political bias and content quality of media outlets from text.
We apply a bidirectional long short-term memory (LSTM) neural network to a data set of more than 1 million tweets.
Our results illustrate the importance of leveraging word order into machine-learning methods in text analysis.
arXiv Detail & Related papers (2022-12-01T03:04:55Z) - Video Manipulations Beyond Faces: A Dataset with Human-Machine Analysis [60.13902294276283]
We present VideoSham, a dataset consisting of 826 videos (413 real and 413 manipulated).
Many of the existing deepfake datasets focus exclusively on two types of facial manipulations -- swapping with a different subject's face or altering the existing face.
Our analysis shows that state-of-the-art manipulation detection algorithms only work for a few specific attacks and do not scale well on VideoSham.
arXiv Detail & Related papers (2022-07-26T17:39:04Z) - Emotion Based Hate Speech Detection using Multimodal Learning [0.0]
This paper proposes the first multimodal deep learning framework to combine the auditory features representing emotion and the semantic features to detect hateful content.
Our results demonstrate that incorporating emotional attributes leads to significant improvement over text-based models in detecting hateful multimedia content.
arXiv Detail & Related papers (2022-02-13T05:39:47Z) - Bridging Vision and Language from the Video-to-Text Perspective: A
Comprehensive Review [1.0520692160489133]
This review categorizes and describes the state-of-the-art techniques for the video-to-text problem.
It covers the main video-to-text methods and the ways to evaluate their performance.
State-of-the-art techniques are still a long way from achieving human-like performance in generating or retrieving video descriptions.
arXiv Detail & Related papers (2021-03-27T02:12:28Z) - A Novel Context-Aware Multimodal Framework for Persian Sentiment
Analysis [19.783517380422854]
We present a first of its kind Persian multimodal dataset comprising more than 800 utterances.
We present a novel context-aware multimodal sentiment analysis framework.
We employ both decision-level (late) and feature-level (early) fusion methods to integrate affective cross-modal information.
arXiv Detail & Related papers (2021-03-03T19:09:01Z) - LIFI: Towards Linguistically Informed Frame Interpolation [66.05105400951567]
We try to solve this problem by using several deep learning video generation algorithms to generate missing frames.
We release several datasets to test computer vision video generation models of their speech understanding.
arXiv Detail & Related papers (2020-10-30T05:02:23Z) - "Notic My Speech" -- Blending Speech Patterns With Multimedia [65.91370924641862]
We propose a view-temporal attention mechanism to model both the view dependence and the visemic importance in speech recognition and understanding.
Our proposed method outperformed the existing work by 4.99% in terms of the viseme error rate.
We show that there is a strong correlation between our model's understanding of multi-view speech and the human perception.
arXiv Detail & Related papers (2020-06-12T06:51:55Z) - Video Captioning with Guidance of Multimodal Latent Topics [123.5255241103578]
We propose an unified caption framework, M&M TGM, which mines multimodal topics in unsupervised fashion from data.
Compared to pre-defined topics, the mined multimodal topics are more semantically and visually coherent.
The results from extensive experiments conducted on the MSR-VTT and Youtube2Text datasets demonstrate the effectiveness of our proposed model.
arXiv Detail & Related papers (2017-08-31T11:18:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.