Related papers: Predicting the Popularity of Micro-videos with Multimodal Variational Encoder-Decoder Framework

Predicting the Popularity of Micro-videos with Multimodal Variational Encoder-Decoder Framework

URL: http://arxiv.org/abs/2003.12724v1
Date: Sat, 28 Mar 2020 06:08:16 GMT
Title: Predicting the Popularity of Micro-videos with Multimodal Variational Encoder-Decoder Framework
Authors: Yaochen Zhu, Jiayi Xie, Zhenzhong Chen
Abstract summary: We propose a multimodal variational encoder-decoder framework for micro-video popularity tasks. MMVED learns a prediction embedding of a micro-video that is informative to its popularity level. Experiments conducted on a public dataset and a dataset we collect from Xigua demonstrate the effectiveness of the proposed MMVED framework.
Score: 54.194340961353944
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As an emerging type of user-generated content, micro-video drastically enriches people's entertainment experiences and social interactions. However, the popularity pattern of an individual micro-video still remains elusive among the researchers. One of the major challenges is that the potential popularity of a micro-video tends to fluctuate under the impact of various external factors, which makes it full of uncertainties. In addition, since micro-videos are mainly uploaded by individuals that lack professional techniques, multiple types of noise could exist that obscure useful information. In this paper, we propose a multimodal variational encoder-decoder (MMVED) framework for micro-video popularity prediction tasks. MMVED learns a stochastic Gaussian embedding of a micro-video that is informative to its popularity level while preserves the inherent uncertainties simultaneously. Moreover, through the optimization of a deep variational information bottleneck lower-bound (IBLBO), the learned hidden representation is shown to be maximally expressive about the popularity target while maximally compressive to the noise in micro-video features. Furthermore, the Bayesian product-of-experts principle is applied to the multimodal encoder, where the decision for information keeping or discarding is made comprehensively with all available modalities. Extensive experiments conducted on a public dataset and a dataset we collect from Xigua demonstrate the effectiveness of the proposed MMVED framework.

Related papers

AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection [57.649223695021114]
We present a novel weakly supervised framework that leverages audio-visual collaboration for robust video anomaly detection. Our framework demonstrates superior performance across multiple benchmarks, with audio integration significantly boosting anomaly detection accuracy.
arXiv Detail & Related papers (2025-04-06T13:59:16Z)
Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation [97.82707398481273]
We develop a novel meta-learning-based multimodal fusion framework called Meta Multimodal Fusion (MetaMMF) Based on the meta information extracted from the multimodal features of the input task, MetaMMF parameterizes a neural network as the item-specific fusion function via a meta learner. We perform extensive experiments on three benchmark datasets, demonstrating the significant improvements over several state-of-the-art multimodal recommendation models.
arXiv Detail & Related papers (2025-01-13T07:51:43Z)
Multi-Aggregator Time-Warping Heterogeneous Graph Neural Network for Personalized Micro-Video Recommendation [6.800887001326282]
Graph Neural Networks-based micro-video recommendation has displayed performance improvement for many kinds of recommendation tasks. In this paper, a novel Multi-aggregator Time-warping Heterogeneous Graph Neural Network (MTHGNN) is proposed for personalized news nature micro-video recommendation.
arXiv Detail & Related papers (2025-01-05T21:14:35Z)
MUFM: A Mamba-Enhanced Feedback Model for Micro Video Popularity Prediction [1.7040391128945196]
We introduce a framework for capturing long-term dependencies in user feedback and dynamic event interactions. Our experiments on the large-scale open-source multi-modal dataset show that our model significantly outperforms state-of-the-art approaches by 23.2%.
arXiv Detail & Related papers (2024-11-23T05:13:27Z)
Dreaming User Multimodal Representation Guided by The Platonic Representation Hypothesis for Micro-Video Recommendation [1.8604168495693911]
We introduce DreamUMM, a novel approach leveraging user historical behaviors to create real-time user representation in a multimoda space. DreamUMM employs a closed-form solution correlating user video preferences with multimodal similarity, hypothesizing that user interests can be effectively represented in a unified multimodal space. Our work contributes to the ongoing exploration of representational convergence by providing empirical evidence supporting the potential for user interest representations to reside in a multimodal space.
arXiv Detail & Related papers (2024-09-15T06:40:38Z)
Orthogonal Hyper-category Guided Multi-interest Elicitation for Micro-video Matching [43.79560010763052]
We propose a model named OPAL for micro-video matching. OPAL elicits a user's multiple heterogeneous interests by disentangling multiple soft and hard interest embeddings. OPAL outperforms six state-of-the-art models in terms of recall and hit rate.
arXiv Detail & Related papers (2024-07-20T03:41:57Z)
Exploring Missing Modality in Multimodal Egocentric Datasets [89.76463983679058]
We introduce a novel concept -Missing Modality Token (MMT)-to maintain performance even when modalities are absent. Our method mitigates the performance loss, reducing it from its original $sim 30%$ drop to only $sim 10%$ when half of the test set is modal-incomplete.
arXiv Detail & Related papers (2024-01-21T11:55:42Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
Causal Video Summarizer for Video Exploration [74.27487067877047]
Causal Video Summarizer (CVS) is proposed to capture the interactive information between the video and query. Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective.
arXiv Detail & Related papers (2023-07-04T22:52:16Z)
Multi-queue Momentum Contrast for Microvideo-Product Retrieval [57.527227171945796]
We formulate the microvideo-product retrieval task, which is the first attempt to explore the retrieval between the multi-modal and multi-modal instances. A novel approach named Multi-Queue Momentum Contrast (MQMC) network is proposed for bidirectional retrieval. A discriminative selection strategy with a multi-queue is used to distinguish the importance of different negatives based on their categories.
arXiv Detail & Related papers (2022-12-22T03:47:14Z)
Modeling High-order Interactions across Multi-interests for Micro-video Reommendation [65.16624625748068]
We propose a Self-over-Co Attention module to enhance user's interest representation. In particular, we first use co-attention to model correlation patterns across different levels and then use self-attention to model correlation patterns within a specific level.
arXiv Detail & Related papers (2021-04-01T07:20:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.