Related papers: Unboxing Engagement in YouTube Influencer Videos: An Attention-Based Approach

Unboxing Engagement in YouTube Influencer Videos: An Attention-Based Approach

URL: http://arxiv.org/abs/2012.12311v5
Date: Fri, 22 Nov 2024 20:24:33 GMT
Title: Unboxing Engagement in YouTube Influencer Videos: An Attention-Based Approach
Authors: Prashant Rajaram, Puneet Manchanda,
Abstract summary: We develop an 'interpretable deep learning framework' that provides insights into the relationships captured by the models. Inspired by visual attention in print advertising, our interpretation approach uses measures of model attention to video features. We apply our framework to YouTube influencer videos, linking video features to measures of shallow and deep engagement.
Score: 0.3686808512438362
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Influencer marketing videos have surged in popularity, yet significant gaps remain in understanding the relationships between video features and engagement. This challenge is intensified by the complexities of interpreting unstructured data. While deep learning models effectively leverage raw unstructured data to predict engagement, they often function as black boxes with limited interpretability, particularly when human validation is hindered by the absence of a known ground truth. To address this issue, we develop an 'interpretable deep learning framework' that provides insights into the relationships captured by the models. Inspired by visual attention in print advertising, our interpretation approach uses measures of model attention to video features, eliminating spurious associations through a two-step process and identifying a subset of relationships for formal causal testing. This approach is versatile, as it applies across well-known attention mechanisms - additive attention, scaled dot-product attention, and gradient-based attention - when analyzing text, audio, or video image data. We apply our framework to YouTube influencer videos, linking video features to measures of shallow and deep engagement developed based on the dual-system framework of thinking. Our findings guide influencers in prioritizing the design of video features associated with deep engagement sentiment.

Related papers

DreamRelation: Relation-Centric Video Customization [33.65405972817795]
Video customization refers to the creation of personalized videos that depict user-specified relations between two subjects. While existing methods can personalize subject appearances and motions, they still struggle with complex video customization. We propose DreamRelation, a novel approach capturing a small set of videos, leveraging two key components: Decoupling Learning and Dynamics Enhancement.
arXiv Detail & Related papers (2025-03-10T17:58:03Z)
Understanding Long Videos via LLM-Powered Entity Relation Graphs [51.13422967711056]
GraphVideoAgent is a framework that maps and monitors the evolving relationships between visual entities throughout the video sequence. Our approach demonstrates remarkable effectiveness when tested against industry benchmarks.
arXiv Detail & Related papers (2025-01-27T10:57:24Z)
Admitting Ignorance Helps the Video Question Answering Models to Answer [82.22149677979189]
We argue that models often establish shortcuts, resulting in spurious correlations between questions and answers. We propose a novel training framework in which the model is compelled to acknowledge its ignorance when presented with an intervened question. In practice, we integrate a state-of-the-art model into our framework to validate its effectiveness.
arXiv Detail & Related papers (2025-01-15T12:44:52Z)
Enhancing Multi-Modal Video Sentiment Classification Through Semi-Supervised Clustering [0.0]
We aim to improve video sentiment classification by focusing on two key aspects: the video itself, the accompanying text, and the acoustic features. We are developing a method that utilizes clustering-based semi-supervised pre-training to extract meaningful representations from the data.
arXiv Detail & Related papers (2025-01-11T08:04:39Z)
Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering [71.62961521518731]
HeurVidQA is a framework that leverages domain-specific entity-actions to refine pre-trained video-language foundation models. Our approach treats these models as implicit knowledge engines, employing domain-specific entity-action prompters to direct the model's focus toward precise cues that enhance reasoning.
arXiv Detail & Related papers (2024-10-12T06:22:23Z)
Compositional Video Generation as Flow Equalization [72.88137795439407]
Large-scale Text-to-Video (T2V) diffusion models have recently demonstrated unprecedented capability to transform natural language descriptions into stunning and photorealistic videos. Despite the promising results, these models struggle to fully grasp complex compositional interactions between multiple concepts and actions. We introduce bftextVico, a generic framework for compositional video generation that explicitly ensures all concepts are represented properly.
arXiv Detail & Related papers (2024-06-10T16:27:47Z)
Video Salient Object Detection via Contrastive Features and Attention Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection. A co-attention formulation is utilized to combine the low-level and high-level features. We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z)
Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input. We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture. The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z)
Identity-aware Graph Memory Network for Action Detection [37.65846189707054]
We explicitly highlight the identity information of the actors in terms of both long-term and short-term context through a graph memory network. Specifically, we propose the hierarchical graph neural network (IGNN) to comprehensively conduct long-term relation modeling. We develop a dual attention module (DAM) to generate identity-aware constraint to reduce the influence of interference by the actors of different identities.
arXiv Detail & Related papers (2021-08-26T02:34:55Z)
Relation-aware Hierarchical Attention Framework for Video Question Answering [6.312182279855817]
We propose a novel Relation-aware Hierarchical Attention (RHA) framework to learn both the static and dynamic relations of the objects in videos. In particular, videos and questions are embedded by pre-trained models firstly to obtain the visual and textual features. We consider the temporal, spatial, and semantic relations, and fuse the multimodal features by hierarchical attention mechanism to predict the answer.
arXiv Detail & Related papers (2021-05-13T09:35:42Z)
Modeling High-order Interactions across Multi-interests for Micro-video Reommendation [65.16624625748068]
We propose a Self-over-Co Attention module to enhance user's interest representation. In particular, we first use co-attention to model correlation patterns across different levels and then use self-attention to model correlation patterns within a specific level.
arXiv Detail & Related papers (2021-04-01T07:20:15Z)
Proactive Pseudo-Intervention: Causally Informed Contrastive Learning For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI) PPI leverages proactive interventions to guard against image features with no causal relevance. We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z)
Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition [19.93779132095822]
We argue that learning features jointly intertwine from these two information channels is beneficial. We propose a single stream architecture able to do so, thanks to the addition of a self-supervised motion prediction block. Experiments on several publicly available databases show the power of our approach.
arXiv Detail & Related papers (2020-02-10T17:51:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.