COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations
- URL: http://arxiv.org/abs/2412.17180v1
- Date: Sun, 22 Dec 2024 22:43:36 GMT
- Title: COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations
- Authors: Vanessa Su, Nirmalya Thakur,
- Abstract summary: This study presents a data-driven analysis of COVID-19 discourse on YouTube.
It examines the sentiment, toxicity, and thematic patterns of video content published between January 2023 and October 2024.
A recommendation system was developed to ensure relevant and context-aligned video recommendations.
- Score: 0.0
- License:
- Abstract: This study presents a data-driven analysis of COVID-19 discourse on YouTube, examining the sentiment, toxicity, and thematic patterns of video content published between January 2023 and October 2024. The analysis involved applying advanced natural language processing (NLP) techniques: sentiment analysis with VADER, toxicity detection with Detoxify, and topic modeling using Latent Dirichlet Allocation (LDA). The sentiment analysis revealed that 49.32% of video descriptions were positive, 36.63% were neutral, and 14.05% were negative, indicating a generally informative and supportive tone in pandemic-related content. Toxicity analysis identified only 0.91% of content as toxic, suggesting minimal exposure to toxic content. Topic modeling revealed two main themes, with 66.74% of the videos covering general health information and pandemic-related impacts and 33.26% focused on news and real-time updates, highlighting the dual informational role of YouTube. A recommendation system was also developed using TF-IDF vectorization and cosine similarity, refined by sentiment, toxicity, and topic filters to ensure relevant and context-aligned video recommendations. This system achieved 69% aggregate coverage, with monthly coverage rates consistently above 85%, demonstrating robust performance and adaptability over time. Evaluation across recommendation sizes showed coverage reaching 69% for five video recommendations and 79% for ten video recommendations per video. In summary, this work presents a framework for understanding COVID-19 discourse on YouTube and a recommendation system that supports user engagement while promoting responsible and relevant content related to COVID-19.
Related papers
- CogMorph: Cognitive Morphing Attacks for Text-to-Image Models [65.38747950692752]
This paper reveals a significant and previously unrecognized ethical risk inherent in text-to-image (T2I) generative models.
We introduce a novel method, termed the Cognitive Morphing Attack (CogMorph), which manipulates T2I models to generate images that retain the original core subjects but embeds toxic or harmful contextual elements.
arXiv Detail & Related papers (2025-01-21T01:45:56Z) - CineXDrama: Relevance Detection and Sentiment Analysis of Bangla YouTube Comments on Movie-Drama using Transformers: Insights from Interpretability Tool [0.0]
We propose a system that first assesses the relevance of comments and then analyzes the sentiment of those deemed relevant.
We introduce a dataset of 14,000 manually collected and preprocessed comments, annotated for relevance (relevant or irrelevant) and sentiment (positive or negative)
arXiv Detail & Related papers (2024-11-10T18:04:41Z) - ToxVidLM: A Multimodal Framework for Toxicity Detection in Code-Mixed Videos [46.148023197749396]
ToxVidLM incorporates three key modules - the multimodal module, Cross-Modal Synchronization module, and Multitask module.
This paper introduces a benchmark dataset consisting of 931 videos with 4021 code-mixed Hindi-English utterances collected from YouTube.
arXiv Detail & Related papers (2024-05-31T05:40:56Z) - Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach [61.04606493712002]
Susceptibility to misinformation describes the degree of belief in unverifiable claims that is not observable.
Existing susceptibility studies heavily rely on self-reported beliefs.
We propose a computational approach to model users' latent susceptibility levels.
arXiv Detail & Related papers (2023-11-16T07:22:56Z) - UATVR: Uncertainty-Adaptive Text-Video Retrieval [90.8952122146241]
A common practice is to transfer text-video pairs to the same embedding space and craft cross-modal interactions with certain entities.
We propose an Uncertainty-language Text-Video Retrieval approach, termed UATVR, which models each look-up as a distribution matching procedure.
arXiv Detail & Related papers (2023-01-16T08:43:17Z) - "Learn the Facts About COVID-19": Analyzing the Use of Warning Labels on
TikTok Videos [12.196005698116858]
We analyze the use of warning labels on TikTok, focusing on COVID-19 videos.
Our analysis shows that TikTok broadly applies warning labels on TikTok videos.
More worrying is the addition of COVID-19 warning labels on videos where their actual content is not related to COVID-19.
arXiv Detail & Related papers (2022-01-19T17:05:23Z) - Global Tweet Mentions of COVID-19 [3.3043776328952226]
We present an open-source dataset of 1.92 million keyword-selected Twitter posts, updated weekly from January 2020 to present.
The dashboard presents 100% of the geotagged tweets that contain keywords or hashtags related COVID-19.
With emerging COVID variants but ongoing vaccine hesitancy and resistance, this dataset could be used by researchers to study numerous aspects of COVID-19.
arXiv Detail & Related papers (2021-08-13T20:21:29Z) - Reliability of Content and Echo Chambers on YouTube during the COVID-19
Debate [0.0]
This paper aims to investigate information diffusion during the COVID-19 pandemic by evaluating news consumption on YouTube.
We analyse more than 2 million users' engagement with 13,000 videos released by 68 YouTube channels labelled with a political bias and fact-checking index.
arXiv Detail & Related papers (2021-06-16T10:44:29Z) - COVIDx-US -- An open-access benchmark dataset of ultrasound imaging data
for AI-driven COVID-19 analytics [116.6248556979572]
COVIDx-US is an open-access benchmark dataset of COVID-19 related ultrasound imaging data.
It consists of 93 lung ultrasound videos and 10,774 processed images of patients infected with SARS-CoV-2 pneumonia, non-SARS-CoV-2 pneumonia, as well as healthy control cases.
arXiv Detail & Related papers (2021-03-18T03:31:33Z) - Classification supporting COVID-19 diagnostics based on patient survey
data [82.41449972618423]
logistic regression and XGBoost classifiers, that allow for effective screening of patients for COVID-19 were generated.
The obtained classification models provided the basis for the DECODE service (decode.polsl.pl), which can serve as support in screening patients with COVID-19 disease.
This data set consists of more than 3,000 examples is based on questionnaires collected at a hospital in Poland.
arXiv Detail & Related papers (2020-11-24T17:44:01Z) - Accelerating COVID-19 Differential Diagnosis with Explainable Ultrasound
Image Analysis [7.471424290647929]
We provide the largest publicly available lung ultrasound (US) dataset for COVID-19 consisting of 106 videos.
We propose a frame-based convolutional neural network that correctly classifies COVID-19 US videos with a sensitivity of 0.98+-0.04 and a specificity of 0.91+-08.
arXiv Detail & Related papers (2020-09-13T23:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.