Predicting the Popularity of Micro-videos with Multimodal Variational
Encoder-Decoder Framework
- URL: http://arxiv.org/abs/2003.12724v1
- Date: Sat, 28 Mar 2020 06:08:16 GMT
- Title: Predicting the Popularity of Micro-videos with Multimodal Variational
Encoder-Decoder Framework
- Authors: Yaochen Zhu, Jiayi Xie, Zhenzhong Chen
- Abstract summary: We propose a multimodal variational encoder-decoder framework for micro-video popularity tasks.
MMVED learns a prediction embedding of a micro-video that is informative to its popularity level.
Experiments conducted on a public dataset and a dataset we collect from Xigua demonstrate the effectiveness of the proposed MMVED framework.
- Score: 54.194340961353944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As an emerging type of user-generated content, micro-video drastically
enriches people's entertainment experiences and social interactions. However,
the popularity pattern of an individual micro-video still remains elusive among
the researchers. One of the major challenges is that the potential popularity
of a micro-video tends to fluctuate under the impact of various external
factors, which makes it full of uncertainties. In addition, since micro-videos
are mainly uploaded by individuals that lack professional techniques, multiple
types of noise could exist that obscure useful information. In this paper, we
propose a multimodal variational encoder-decoder (MMVED) framework for
micro-video popularity prediction tasks. MMVED learns a stochastic Gaussian
embedding of a micro-video that is informative to its popularity level while
preserves the inherent uncertainties simultaneously. Moreover, through the
optimization of a deep variational information bottleneck lower-bound (IBLBO),
the learned hidden representation is shown to be maximally expressive about the
popularity target while maximally compressive to the noise in micro-video
features. Furthermore, the Bayesian product-of-experts principle is applied to
the multimodal encoder, where the decision for information keeping or
discarding is made comprehensively with all available modalities. Extensive
experiments conducted on a public dataset and a dataset we collect from Xigua
demonstrate the effectiveness of the proposed MMVED framework.
Related papers
- Dreaming User Multimodal Representation Guided by The Platonic Representation Hypothesis for Micro-Video Recommendation [1.8604168495693911]
We introduce DreamUMM, a novel approach leveraging user historical behaviors to create real-time user representation in a multimoda space.
DreamUMM employs a closed-form solution correlating user video preferences with multimodal similarity, hypothesizing that user interests can be effectively represented in a unified multimodal space.
Our work contributes to the ongoing exploration of representational convergence by providing empirical evidence supporting the potential for user interest representations to reside in a multimodal space.
arXiv Detail & Related papers (2024-09-15T06:40:38Z) - Orthogonal Hyper-category Guided Multi-interest Elicitation for Micro-video Matching [43.79560010763052]
We propose a model named OPAL for micro-video matching.
OPAL elicits a user's multiple heterogeneous interests by disentangling multiple soft and hard interest embeddings.
OPAL outperforms six state-of-the-art models in terms of recall and hit rate.
arXiv Detail & Related papers (2024-07-20T03:41:57Z) - Exploring Missing Modality in Multimodal Egocentric Datasets [89.76463983679058]
We introduce a novel concept -Missing Modality Token (MMT)-to maintain performance even when modalities are absent.
Our method mitigates the performance loss, reducing it from its original $sim 30%$ drop to only $sim 10%$ when half of the test set is modal-incomplete.
arXiv Detail & Related papers (2024-01-21T11:55:42Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Causal Video Summarizer for Video Exploration [74.27487067877047]
Causal Video Summarizer (CVS) is proposed to capture the interactive information between the video and query.
Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective.
arXiv Detail & Related papers (2023-07-04T22:52:16Z) - Denoising Bottleneck with Mutual Information Maximization for Video
Multimodal Fusion [30.631733395175765]
Video multimodal fusion aims to integrate multimodal signals in videos.
Video has longer multimodal sequences with more redundancy and noise in visual and audio modalities.
We propose a denoising bottleneck fusion model for fine-grained video fusion.
arXiv Detail & Related papers (2023-05-24T02:39:43Z) - Multi-queue Momentum Contrast for Microvideo-Product Retrieval [57.527227171945796]
We formulate the microvideo-product retrieval task, which is the first attempt to explore the retrieval between the multi-modal and multi-modal instances.
A novel approach named Multi-Queue Momentum Contrast (MQMC) network is proposed for bidirectional retrieval.
A discriminative selection strategy with a multi-queue is used to distinguish the importance of different negatives based on their categories.
arXiv Detail & Related papers (2022-12-22T03:47:14Z) - Modeling High-order Interactions across Multi-interests for Micro-video
Reommendation [65.16624625748068]
We propose a Self-over-Co Attention module to enhance user's interest representation.
In particular, we first use co-attention to model correlation patterns across different levels and then use self-attention to model correlation patterns within a specific level.
arXiv Detail & Related papers (2021-04-01T07:20:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.