Short Video Segment-level User Dynamic Interests Modeling in Personalized Recommendation
- URL: http://arxiv.org/abs/2504.04237v2
- Date: Wed, 23 Apr 2025 02:35:55 GMT
- Title: Short Video Segment-level User Dynamic Interests Modeling in Personalized Recommendation
- Authors: Zhiyu He, Zhixin Ling, Jiayu Li, Zhiqiang Guo, Weizhi Ma, Xinchen Luo, Min Zhang, Guorui Zhou,
- Abstract summary: Short video growth has necessitated effective recommender systems to match users with content tailored to their evolving preferences.<n>Current video recommendation models primarily treat each video as a whole, overlooking the dynamic nature of user preferences with specific video segments.<n>We propose an innovative model that integrates a hybrid representation module, a multi-modal user-video encoder, and a segment interest decoder.
- Score: 23.082810471266235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid growth of short videos has necessitated effective recommender systems to match users with content tailored to their evolving preferences. Current video recommendation models primarily treat each video as a whole, overlooking the dynamic nature of user preferences with specific video segments. In contrast, our research focuses on segment-level user interest modeling, which is crucial for understanding how users' preferences evolve during video browsing. To capture users' dynamic segment interests, we propose an innovative model that integrates a hybrid representation module, a multi-modal user-video encoder, and a segment interest decoder. Our model addresses the challenges of capturing dynamic interest patterns, missing segment-level labels, and fusing different modalities, achieving precise segment-level interest prediction. We present two downstream tasks to evaluate the effectiveness of our segment interest modeling approach: video-skip prediction and short video recommendation. Our experiments on real-world short video datasets with diverse modalities show promising results on both tasks. It demonstrates that segment-level interest modeling brings a deep understanding of user engagement and enhances video recommendations. We also release a unique dataset that includes segment-level video data and diverse user behaviors, enabling further research in segment-level interest modeling. This work pioneers a novel perspective on understanding user segment-level preference, offering the potential for more personalized and engaging short video experiences.
Related papers
- Generate the browsing process for short-video recommendation [10.110926043437113]
This paper introduces a new model to generate the browsing process for short-video recommendation.
It proposes a novel Segment Content Aware Model via User Engagement Feedback (SCAM) for watch time prediction in video recommendation.
arXiv Detail & Related papers (2025-04-02T20:54:52Z) - Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended? [22.191260650245443]
Video segmentation aims at partitioning video sequences into meaningful segments based on objects or regions of interest within frames.
Current video segmentation models are often derived from image segmentation techniques, which struggle to cope with small-scale or class-imbalanced video datasets.
We propose a training strategy Masked Video Consistency, which enhances spatial and temporal feature aggregation.
arXiv Detail & Related papers (2024-08-20T08:08:32Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - Modeling High-order Interactions across Multi-interests for Micro-video
Reommendation [65.16624625748068]
We propose a Self-over-Co Attention module to enhance user's interest representation.
In particular, we first use co-attention to model correlation patterns across different levels and then use self-attention to model correlation patterns within a specific level.
arXiv Detail & Related papers (2021-04-01T07:20:15Z) - Comprehensive Information Integration Modeling Framework for Video
Titling [124.11296128308396]
We integrate comprehensive sources of information, including the content of consumer-generated videos, the narrative comment sentences supplied by consumers, and the product attributes, in an end-to-end modeling framework.
To tackle this issue, the proposed method consists of two processes, i.e., granular-level interaction modeling and abstraction-level story-line summarization.
We collect a large-scale dataset accordingly from real-world data in Taobao, a world-leading e-commerce platform.
arXiv Detail & Related papers (2020-06-24T10:38:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.