Predicting the Popularity of Micro-videos with Multimodal Variational
Encoder-Decoder Framework
- URL: http://arxiv.org/abs/2003.12724v1
- Date: Sat, 28 Mar 2020 06:08:16 GMT
- Title: Predicting the Popularity of Micro-videos with Multimodal Variational
Encoder-Decoder Framework
- Authors: Yaochen Zhu, Jiayi Xie, Zhenzhong Chen
- Abstract summary: We propose a multimodal variational encoder-decoder framework for micro-video popularity tasks.
MMVED learns a prediction embedding of a micro-video that is informative to its popularity level.
Experiments conducted on a public dataset and a dataset we collect from Xigua demonstrate the effectiveness of the proposed MMVED framework.
- Score: 54.194340961353944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As an emerging type of user-generated content, micro-video drastically
enriches people's entertainment experiences and social interactions. However,
the popularity pattern of an individual micro-video still remains elusive among
the researchers. One of the major challenges is that the potential popularity
of a micro-video tends to fluctuate under the impact of various external
factors, which makes it full of uncertainties. In addition, since micro-videos
are mainly uploaded by individuals that lack professional techniques, multiple
types of noise could exist that obscure useful information. In this paper, we
propose a multimodal variational encoder-decoder (MMVED) framework for
micro-video popularity prediction tasks. MMVED learns a stochastic Gaussian
embedding of a micro-video that is informative to its popularity level while
preserves the inherent uncertainties simultaneously. Moreover, through the
optimization of a deep variational information bottleneck lower-bound (IBLBO),
the learned hidden representation is shown to be maximally expressive about the
popularity target while maximally compressive to the noise in micro-video
features. Furthermore, the Bayesian product-of-experts principle is applied to
the multimodal encoder, where the decision for information keeping or
discarding is made comprehensively with all available modalities. Extensive
experiments conducted on a public dataset and a dataset we collect from Xigua
demonstrate the effectiveness of the proposed MMVED framework.
Related papers
- Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation [97.82707398481273]
We develop a novel meta-learning-based multimodal fusion framework called Meta Multimodal Fusion (MetaMMF)
Based on the meta information extracted from the multimodal features of the input task, MetaMMF parameterizes a neural network as the item-specific fusion function via a meta learner.
We perform extensive experiments on three benchmark datasets, demonstrating the significant improvements over several state-of-the-art multimodal recommendation models.
arXiv Detail & Related papers (2025-01-13T07:51:43Z) - Multi-Aggregator Time-Warping Heterogeneous Graph Neural Network for Personalized Micro-Video Recommendation [3.0734655107038713]
Graph Neural Networks-based micro-video recommendation has displayed performance improvement for many kinds of recommendation tasks.
In this paper, a novel Multi-aggregator Time-warping Heterogeneous Graph Neural Network (MTHGNN) is proposed for personalized news nature micro-video recommendation.
arXiv Detail & Related papers (2025-01-05T21:14:35Z) - MUFM: A Mamba-Enhanced Feedback Model for Micro Video Popularity Prediction [1.7040391128945196]
We introduce a framework for capturing long-term dependencies in user feedback and dynamic event interactions.
Our experiments on the large-scale open-source multi-modal dataset show that our model significantly outperforms state-of-the-art approaches by 23.2%.
arXiv Detail & Related papers (2024-11-23T05:13:27Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Causal Video Summarizer for Video Exploration [74.27487067877047]
Causal Video Summarizer (CVS) is proposed to capture the interactive information between the video and query.
Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective.
arXiv Detail & Related papers (2023-07-04T22:52:16Z) - Multi-queue Momentum Contrast for Microvideo-Product Retrieval [57.527227171945796]
We formulate the microvideo-product retrieval task, which is the first attempt to explore the retrieval between the multi-modal and multi-modal instances.
A novel approach named Multi-Queue Momentum Contrast (MQMC) network is proposed for bidirectional retrieval.
A discriminative selection strategy with a multi-queue is used to distinguish the importance of different negatives based on their categories.
arXiv Detail & Related papers (2022-12-22T03:47:14Z) - Modeling High-order Interactions across Multi-interests for Micro-video
Reommendation [65.16624625748068]
We propose a Self-over-Co Attention module to enhance user's interest representation.
In particular, we first use co-attention to model correlation patterns across different levels and then use self-attention to model correlation patterns within a specific level.
arXiv Detail & Related papers (2021-04-01T07:20:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.