Related papers: MUFM: A Mamba-Enhanced Feedback Model for Micro Video Popularity Prediction

MUFM: A Mamba-Enhanced Feedback Model for Micro Video Popularity Prediction

URL: http://arxiv.org/abs/2411.15455v1
Date: Sat, 23 Nov 2024 05:13:27 GMT
Title: MUFM: A Mamba-Enhanced Feedback Model for Micro Video Popularity Prediction
Authors: Jiacheng Lu, Mingyuan Xiao, Weijian Wang, Yuxin Du, Yi Cui, Jingnan Zhao, Cheng Hua,
Abstract summary: We introduce a framework for capturing long-term dependencies in user feedback and dynamic event interactions. Our experiments on the large-scale open-source multi-modal dataset show that our model significantly outperforms state-of-the-art approaches by 23.2%.
Score: 1.7040391128945196
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The surge in micro-videos is transforming the concept of popularity. As researchers delve into vast multi-modal datasets, there is a growing interest in understanding the origins of this popularity and the forces driving its rapid expansion. Recent studies suggest that the virality of short videos is not only tied to their inherent multi-modal content but is also heavily influenced by the strength of platform recommendations driven by audience feedback. In this paper, we introduce a framework for capturing long-term dependencies in user feedback and dynamic event interactions, based on the Mamba Hawkes process. Our experiments on the large-scale open-source multi-modal dataset show that our model significantly outperforms state-of-the-art approaches across various metrics by 23.2%. We believe our model's capability to map the relationships within user feedback behavior sequences will not only contribute to the evolution of next-generation recommendation algorithms and platform applications but also enhance our understanding of micro video dissemination and its broader societal impact.

Related papers

Short-video Propagation Influence Rating: A New Real-world Dataset and A New Large Graph Model [55.58701436630489]
Cross-platform Short-Video dataset includes 117,720 videos, 381,926 samples, and 535 topics across 5 biggest Chinese platforms. Large Graph Model (LGM) named NetGPT can bridge heterogeneous graph-structured data with the powerful reasoning ability and knowledge of Large Language Models (LLMs) Our NetGPT can comprehend and analyze the short-video propagation graph, enabling it to predict the long-term propagation influence of short-videos.
arXiv Detail & Related papers (2025-03-31T05:53:15Z)
Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation [97.82707398481273]
We develop a novel meta-learning-based multimodal fusion framework called Meta Multimodal Fusion (MetaMMF) Based on the meta information extracted from the multimodal features of the input task, MetaMMF parameterizes a neural network as the item-specific fusion function via a meta learner. We perform extensive experiments on three benchmark datasets, demonstrating the significant improvements over several state-of-the-art multimodal recommendation models.
arXiv Detail & Related papers (2025-01-13T07:51:43Z)
Dreaming User Multimodal Representation Guided by The Platonic Representation Hypothesis for Micro-Video Recommendation [1.8604168495693911]
We introduce DreamUMM, a novel approach leveraging user historical behaviors to create real-time user representation in a multimoda space. DreamUMM employs a closed-form solution correlating user video preferences with multimodal similarity, hypothesizing that user interests can be effectively represented in a unified multimodal space. Our work contributes to the ongoing exploration of representational convergence by providing empirical evidence supporting the potential for user interest representations to reside in a multimodal space.
arXiv Detail & Related papers (2024-09-15T06:40:38Z)
DiffMM: Multi-Modal Diffusion Model for Recommendation [19.43775593283657]
We propose a novel multi-modal graph diffusion model for recommendation called DiffMM. Our framework integrates a modality-aware graph diffusion model with a cross-modal contrastive learning paradigm to improve modality-aware user representation learning.
arXiv Detail & Related papers (2024-06-17T17:35:54Z)
MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion [18.499672566131355]
Accurately modeling the gifting interaction not only enhances users' experience but also increases streamers' revenue. Previous studies on live streaming gifting prediction treat this task as a conventional recommendation problem. We propose MMBee based on real-time Multi-Modal Fusion and Behaviour Expansion to address these issues.
arXiv Detail & Related papers (2024-06-15T04:59:00Z)
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward [118.65089648651308]
This paper introduces a novel framework that utilizes detailed video captions as a proxy of video content. We show that applying this tailored reward through DPO significantly improves the performance of video LMMs on video Question Answering (QA) tasks.
arXiv Detail & Related papers (2024-04-01T17:28:16Z)
Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. We pose the problem of unseen modality interaction and introduce a first solution. It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z)
Multi-queue Momentum Contrast for Microvideo-Product Retrieval [57.527227171945796]
We formulate the microvideo-product retrieval task, which is the first attempt to explore the retrieval between the multi-modal and multi-modal instances. A novel approach named Multi-Queue Momentum Contrast (MQMC) network is proposed for bidirectional retrieval. A discriminative selection strategy with a multi-queue is used to distinguish the importance of different negatives based on their categories.
arXiv Detail & Related papers (2022-12-22T03:47:14Z)
Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features. We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors. Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z)
Modeling High-order Interactions across Multi-interests for Micro-video Reommendation [65.16624625748068]
We propose a Self-over-Co Attention module to enhance user's interest representation. In particular, we first use co-attention to model correlation patterns across different levels and then use self-attention to model correlation patterns within a specific level.
arXiv Detail & Related papers (2021-04-01T07:20:15Z)
Predicting the Popularity of Micro-videos with Multimodal Variational Encoder-Decoder Framework [54.194340961353944]
We propose a multimodal variational encoder-decoder framework for micro-video popularity tasks. MMVED learns a prediction embedding of a micro-video that is informative to its popularity level. Experiments conducted on a public dataset and a dataset we collect from Xigua demonstrate the effectiveness of the proposed MMVED framework.
arXiv Detail & Related papers (2020-03-28T06:08:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.