Unboxing Engagement in YouTube Influencer Videos: An Attention-Based Approach
- URL: http://arxiv.org/abs/2012.12311v5
- Date: Fri, 22 Nov 2024 20:24:33 GMT
- Title: Unboxing Engagement in YouTube Influencer Videos: An Attention-Based Approach
- Authors: Prashant Rajaram, Puneet Manchanda,
- Abstract summary: We develop an 'interpretable deep learning framework' that provides insights into the relationships captured by the models.
Inspired by visual attention in print advertising, our interpretation approach uses measures of model attention to video features.
We apply our framework to YouTube influencer videos, linking video features to measures of shallow and deep engagement.
- Score: 0.3686808512438362
- License:
- Abstract: Influencer marketing videos have surged in popularity, yet significant gaps remain in understanding the relationships between video features and engagement. This challenge is intensified by the complexities of interpreting unstructured data. While deep learning models effectively leverage raw unstructured data to predict engagement, they often function as black boxes with limited interpretability, particularly when human validation is hindered by the absence of a known ground truth. To address this issue, we develop an 'interpretable deep learning framework' that provides insights into the relationships captured by the models. Inspired by visual attention in print advertising, our interpretation approach uses measures of model attention to video features, eliminating spurious associations through a two-step process and identifying a subset of relationships for formal causal testing. This approach is versatile, as it applies across well-known attention mechanisms - additive attention, scaled dot-product attention, and gradient-based attention - when analyzing text, audio, or video image data. We apply our framework to YouTube influencer videos, linking video features to measures of shallow and deep engagement developed based on the dual-system framework of thinking. Our findings guide influencers in prioritizing the design of video features associated with deep engagement sentiment.
Related papers
- Understanding Long Videos via LLM-Powered Entity Relation Graphs [51.13422967711056]
GraphVideoAgent is a framework that maps and monitors the evolving relationships between visual entities throughout the video sequence.
Our approach demonstrates remarkable effectiveness when tested against industry benchmarks.
arXiv Detail & Related papers (2025-01-27T10:57:24Z) - Admitting Ignorance Helps the Video Question Answering Models to Answer [82.22149677979189]
We argue that models often establish shortcuts, resulting in spurious correlations between questions and answers.
We propose a novel training framework in which the model is compelled to acknowledge its ignorance when presented with an intervened question.
In practice, we integrate a state-of-the-art model into our framework to validate its effectiveness.
arXiv Detail & Related papers (2025-01-15T12:44:52Z) - Enhancing Multi-Modal Video Sentiment Classification Through Semi-Supervised Clustering [0.0]
We aim to improve video sentiment classification by focusing on two key aspects: the video itself, the accompanying text, and the acoustic features.
We are developing a method that utilizes clustering-based semi-supervised pre-training to extract meaningful representations from the data.
arXiv Detail & Related papers (2025-01-11T08:04:39Z) - Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering [71.62961521518731]
HeurVidQA is a framework that leverages domain-specific entity-actions to refine pre-trained video-language foundation models.
Our approach treats these models as implicit knowledge engines, employing domain-specific entity-action prompters to direct the model's focus toward precise cues that enhance reasoning.
arXiv Detail & Related papers (2024-10-12T06:22:23Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Identity-aware Graph Memory Network for Action Detection [37.65846189707054]
We explicitly highlight the identity information of the actors in terms of both long-term and short-term context through a graph memory network.
Specifically, we propose the hierarchical graph neural network (IGNN) to comprehensively conduct long-term relation modeling.
We develop a dual attention module (DAM) to generate identity-aware constraint to reduce the influence of interference by the actors of different identities.
arXiv Detail & Related papers (2021-08-26T02:34:55Z) - Modeling High-order Interactions across Multi-interests for Micro-video
Reommendation [65.16624625748068]
We propose a Self-over-Co Attention module to enhance user's interest representation.
In particular, we first use co-attention to model correlation patterns across different levels and then use self-attention to model correlation patterns within a specific level.
arXiv Detail & Related papers (2021-04-01T07:20:15Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z) - Self-Supervised Joint Encoding of Motion and Appearance for First Person
Action Recognition [19.93779132095822]
We argue that learning features jointly intertwine from these two information channels is beneficial.
We propose a single stream architecture able to do so, thanks to the addition of a self-supervised motion prediction block.
Experiments on several publicly available databases show the power of our approach.
arXiv Detail & Related papers (2020-02-10T17:51:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.