Personalizing Fast-Forward Videos Based on Visual and Textual Features
from Social Network
- URL: http://arxiv.org/abs/1912.12655v1
- Date: Sun, 29 Dec 2019 14:09:32 GMT
- Title: Personalizing Fast-Forward Videos Based on Visual and Textual Features
from Social Network
- Authors: Washington L. S. Ramos, Michel M. Silva, Edson R. Araujo, Alan C.
Neves, Erickson R. Nascimento
- Abstract summary: We present a new approach to automatically creating personalized fast-forward videos for First-Person Videos (FPVs)
Our approach explores the availability of text-centric data from the user's social networks to infer her/his topics of interest and assigns scores to the input frames according to her/his preferences.
- Score: 9.353403626477135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growth of Social Networks has fueled the habit of people logging their
day-to-day activities, and long First-Person Videos (FPVs) are one of the main
tools in this new habit. Semantic-aware fast-forward methods are able to
decrease the watch time and select meaningful moments, which is key to increase
the chances of these videos being watched. However, these methods can not
handle semantics in terms of personalization. In this work, we present a new
approach to automatically creating personalized fast-forward videos for FPVs.
Our approach explores the availability of text-centric data from the user's
social networks such as status updates to infer her/his topics of interest and
assigns scores to the input frames according to her/his preferences. Extensive
experiments are conducted on three different datasets with simulated and
real-world users as input, achieving an average F1 score of up to 12.8
percentage points higher than the best competitors. We also present a user
study to demonstrate the effectiveness of our method.
Related papers
- Short-video Propagation Influence Rating: A New Real-world Dataset and A New Large Graph Model [55.58701436630489]
Cross-platform Short-Video dataset includes 117,720 videos, 381,926 samples, and 535 topics across 5 biggest Chinese platforms.
Large Graph Model (LGM) named NetGPT can bridge heterogeneous graph-structured data with the powerful reasoning ability and knowledge of Large Language Models (LLMs)
Our NetGPT can comprehend and analyze the short-video propagation graph, enabling it to predict the long-term propagation influence of short-videos.
arXiv Detail & Related papers (2025-03-31T05:53:15Z) - Delving Deep into Engagement Prediction of Short Videos [34.38399476375175]
This study delves deep into the intricacies of predicting engagement for newly published videos with limited user interactions.
We introduce a substantial dataset comprising 90,000 real-world short videos from Snapchat.
Our method demonstrates its ability to predict engagements of short videos purely from video content.
arXiv Detail & Related papers (2024-09-30T23:57:07Z) - SWaT: Statistical Modeling of Video Watch Time through User Behavior Analysis [15.246875830547056]
We propose a white-box statistical framework that translates various user behavior assumptions in watching (short) videos into statistical watch time models.
We test our models extensively on two public datasets, a large-scale offline industrial dataset, and an online A/B test on a short video platform with hundreds of millions of daily-active users.
arXiv Detail & Related papers (2024-08-14T18:19:35Z) - VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding [63.075626670943116]
We introduce a cutting-edge framework, VaQuitA, designed to refine the synergy between video and textual information.
At the data level, instead of sampling frames uniformly, we implement a sampling method guided by CLIP-score rankings.
At the feature level, we integrate a trainable Video Perceiver alongside a Visual-Query Transformer.
arXiv Detail & Related papers (2023-12-04T19:48:02Z) - Video-based Person Re-identification with Long Short-Term Representation
Learning [101.62570747820541]
Video-based person Re-Identification (V-ReID) aims to retrieve specific persons from raw videos captured by non-overlapped cameras.
We propose a novel deep learning framework named Long Short-Term Representation Learning (LSTRL) for effective V-ReID.
arXiv Detail & Related papers (2023-08-07T16:22:47Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - HighlightMe: Detecting Highlights from Human-Centric Videos [52.84233165201391]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos.
We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions.
We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z) - QVHighlights: Detecting Moments and Highlights in Videos via Natural
Language Queries [89.24431389933703]
We present the Query-based Video Highlights (QVHighlights) dataset.
It consists of over 10,000 YouTube videos, covering a wide range of topics.
Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w.r.t. the query, and (3) five-point scale saliency scores for all query-relevant clips.
arXiv Detail & Related papers (2021-07-20T16:42:58Z) - Full-Body Awareness from Partial Observations [17.15829643665034]
We propose a self-training framework that adapts human 3D mesh recovery systems to consumer videos.
We show that our method substantially improves PCK and human-subject judgments compared to baselines.
arXiv Detail & Related papers (2020-08-13T17:59:11Z) - Generalized Few-Shot Video Classification with Video Retrieval and
Feature Generation [132.82884193921535]
We argue that previous methods underestimate the importance of video feature learning and propose a two-stage approach.
We show that this simple baseline approach outperforms prior few-shot video classification methods by over 20 points on existing benchmarks.
We present two novel approaches that yield further improvement.
arXiv Detail & Related papers (2020-07-09T13:05:32Z) - A gaze driven fast-forward method for first-person videos [2.362412515574206]
We address the problem of accessing relevant information in First-Person Videos by creating an accelerated version of the input video and emphasizing the important moments to the recorder.
Our method is based on an attention model driven by gaze and visual scene analysis that provides a semantic score of each frame of the input video.
arXiv Detail & Related papers (2020-06-10T00:08:42Z) - Straight to the Point: Fast-forwarding Videos via Reinforcement Learning
Using Textual Data [1.004766879203303]
We present a novel methodology based on a reinforcement learning formulation to accelerate instructional videos.
Our approach can adaptively select frames that are not relevant to convey the information without creating gaps in the final video.
We propose a novel network, called Visually-guided Document Attention Network (VDAN), able to generate a highly discriminative embedding space.
arXiv Detail & Related papers (2020-03-31T14:07:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.