Unboxing Engagement in YouTube Influencer Videos: An Attention-Based Approach
- URL: http://arxiv.org/abs/2012.12311v4
- Date: Mon, 26 Aug 2024 15:34:13 GMT
- Title: Unboxing Engagement in YouTube Influencer Videos: An Attention-Based Approach
- Authors: Prashant Rajaram, Puneet Manchanda,
- Abstract summary: The authors develop an "interpretable deep learning framework" that makes good out-of-sample predictions using unstructured data.
Inspired by visual attention in print advertising, the interpretation approach uses measures of model attention to video features.
The framework is applied to YouTube influencer videos, linking video features to measures of shallow and deep engagement.
- Score: 0.3686808512438362
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Influencer marketing videos have surged in popularity, yet significant gaps remain in understanding the relationship between video features and engagement. This challenge is intensified by the complexities of interpreting unstructured data. While deep learning models effectively leverage unstructured data to predict business outcomes, they often function as black boxes with limited interpretability, particularly when human validation is hindered by the absence of a known ground truth. To address this issue, the authors develop an "interpretable deep learning framework" that not only makes good out-of-sample predictions using unstructured data but also provides insights into the captured relationships. Inspired by visual attention in print advertising, the interpretation approach uses measures of model attention to video features, eliminating spurious associations through a two-step process and shortlisting relationships for formal causal testing. This method is applicable across well-known attention mechanisms - additive attention, scaled dot-product attention, and gradient-based attention - when analyzing text, audio, or video image data. Validated using simulations, this approach outperforms benchmark feature selection methods. This framework is applied to YouTube influencer videos, linking video features to measures of shallow and deep engagement developed based on the dual-system framework of thinking. The findings guide influencers and brands in prioritizing video features associated with deep engagement.
Related papers
- Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering [71.62961521518731]
HeurVidQA is a framework that leverages domain-specific entity-actions to refine pre-trained video-language foundation models.
Our approach treats these models as implicit knowledge engines, employing domain-specific entity-action prompters to direct the model's focus toward precise cues that enhance reasoning.
arXiv Detail & Related papers (2024-10-12T06:22:23Z) - Compositional Video Generation as Flow Equalization [72.88137795439407]
Large-scale Text-to-Video (T2V) diffusion models have recently demonstrated unprecedented capability to transform natural language descriptions into stunning and photorealistic videos.
Despite the promising results, these models struggle to fully grasp complex compositional interactions between multiple concepts and actions.
We introduce bftextVico, a generic framework for compositional video generation that explicitly ensures all concepts are represented properly.
arXiv Detail & Related papers (2024-06-10T16:27:47Z) - Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection [19.643936110623653]
Video Anomaly Detection (VAD) aims to identify abnormalities within a specific context and timeframe.
Recent deep learning-based VAD models have shown promising results by generating high-resolution frames.
We propose a self-supervised learning approach for VAD through an inter-patch relationship prediction task.
arXiv Detail & Related papers (2024-03-28T03:07:16Z) - CM-PIE: Cross-modal perception for interactive-enhanced audio-visual
video parsing [23.85763377992709]
We propose a novel interactive-enhanced cross-modal perception method(CM-PIE), which can learn fine-grained features by applying a segment-based attention module.
We show that our model offers improved parsing performance on the Look, Listen, and Parse dataset.
arXiv Detail & Related papers (2023-10-11T14:15:25Z) - Mitigating Representation Bias in Action Recognition: Algorithms and
Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects.
We tackle this problem from two different angles: algorithm and dataset.
We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z) - Attention improves concentration when learning node embeddings [1.2233362977312945]
Given nodes labelled with search query text, we want to predict links to related queries that share products.
Experiments with a range of deep neural architectures show that simple feedforward networks with an attention mechanism perform best for learning embeddings.
We propose an analytically tractable model of query generation, AttEST, that views both products and the query text as vectors embedded in a latent space.
arXiv Detail & Related papers (2020-06-11T21:21:12Z) - Action Localization through Continual Predictive Learning [14.582013761620738]
We present a new approach based on continual learning that uses feature-level predictions for self-supervision.
We use a stack of LSTMs coupled with CNN encoder, along with novel attention mechanisms, to model the events in the video and use this model to predict high-level features for the future frames.
This self-supervised framework is not complicated as other approaches but is very effective in learning robust visual representations for both labeling and localization.
arXiv Detail & Related papers (2020-03-26T23:32:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.