MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion
- URL: http://arxiv.org/abs/2407.00056v1
- Date: Sat, 15 Jun 2024 04:59:00 GMT
- Title: MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion
- Authors: Jiaxin Deng, Shiyao Wang, Yuchen Wang, Jiansong Qi, Liqin Zhao, Guorui Zhou, Gaofeng Meng,
- Abstract summary: Accurately modeling the gifting interaction not only enhances users' experience but also increases streamers' revenue.
Previous studies on live streaming gifting prediction treat this task as a conventional recommendation problem.
We propose MMBee based on real-time Multi-Modal Fusion and Behaviour Expansion to address these issues.
- Score: 18.499672566131355
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Live streaming services are becoming increasingly popular due to real-time interactions and entertainment. Viewers can chat and send comments or virtual gifts to express their preferences for the streamers. Accurately modeling the gifting interaction not only enhances users' experience but also increases streamers' revenue. Previous studies on live streaming gifting prediction treat this task as a conventional recommendation problem, and model users' preferences using categorical data and observed historical behaviors. However, it is challenging to precisely describe the real-time content changes in live streaming using limited categorical information. Moreover, due to the sparsity of gifting behaviors, capturing the preferences and intentions of users is quite difficult. In this work, we propose MMBee based on real-time Multi-Modal Fusion and Behaviour Expansion to address these issues. Specifically, we first present a Multi-modal Fusion Module with Learnable Query (MFQ) to perceive the dynamic content of streaming segments and process complex multi-modal interactions, including images, text comments and speech. To alleviate the sparsity issue of gifting behaviors, we present a novel Graph-guided Interest Expansion (GIE) approach that learns both user and streamer representations on large-scale gifting graphs with multi-modal attributes. Comprehensive experiment results show that MMBee achieves significant performance improvements on both public datasets and Kuaishou real-world streaming datasets and the effectiveness has been further validated through online A/B experiments. MMBee has been deployed and is serving hundreds of millions of users at Kuaishou.
Related papers
- Dreaming User Multimodal Representation Guided by The Platonic Representation Hypothesis for Micro-Video Recommendation [1.8604168495693911]
We introduce DreamUMM, a novel approach leveraging user historical behaviors to create real-time user representation in a multimoda space.
DreamUMM employs a closed-form solution correlating user video preferences with multimodal similarity, hypothesizing that user interests can be effectively represented in a unified multimodal space.
Our work contributes to the ongoing exploration of representational convergence by providing empirical evidence supporting the potential for user interest representations to reside in a multimodal space.
arXiv Detail & Related papers (2024-09-15T06:40:38Z) - GUME: Graphs and User Modalities Enhancement for Long-Tail Multimodal Recommendation [13.1192216083304]
We propose a novel Graphs and User Modalities Enhancement (GUME) for long-tail multimodal recommendation.
Specifically, we first enhance the user-item graph using multimodal similarity between items.
We then construct two types of user modalities: explicit interaction features and extended interest features.
arXiv Detail & Related papers (2024-07-17T06:29:00Z) - A Multimodal Transformer for Live Streaming Highlight Prediction [26.787089919015983]
Live streaming requires models to infer without future frames and process complex multimodal interactions.
We introduce a novel Modality Temporal Alignment Module to handle the temporal shift of cross-modal signals.
We propose a novel Border-aware Pairwise Loss to learn from a large-scale dataset and utilize user implicit feedback as a weak supervision signal.
arXiv Detail & Related papers (2024-06-15T04:59:19Z) - LLM-ESR: Large Language Models Enhancement for Long-tailed Sequential Recommendation [58.04939553630209]
In real-world systems, most users interact with only a handful of items, while the majority of items are seldom consumed.
These two issues, known as the long-tail user and long-tail item challenges, often pose difficulties for existing Sequential Recommendation systems.
We propose the Large Language Models Enhancement framework for Sequential Recommendation (LLM-ESR) to address these challenges.
arXiv Detail & Related papers (2024-05-31T07:24:42Z) - Knowledge-Aware Multi-Intent Contrastive Learning for Multi-Behavior Recommendation [6.522900133742931]
Multi-behavioral recommendation provides users with more accurate choices based on diverse behaviors, such as view, add to cart, and purchase.
We propose a novel model: Knowledge-Aware Multi-Intent Contrastive Learning (KAMCL) model.
This model uses relationships in the knowledge graph to construct intents, aiming to mine the connections between users' multi-behaviors from the perspective of intents to achieve more accurate recommendations.
arXiv Detail & Related papers (2024-04-18T08:39:52Z) - Learning Self-Modulating Attention in Continuous Time Space with
Applications to Sequential Recommendation [102.24108167002252]
We propose a novel attention network, named self-modulating attention, that models the complex and non-linearly evolving dynamic user preferences.
We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three large-scale real-world datasets show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-03-30T03:54:11Z) - A Deep Graph Reinforcement Learning Model for Improving User Experience
in Live Video Streaming [7.852895577861326]
We present a deep graph reinforcement learning model to predict and improve the user experience during a live video streaming event.
Our model can significantly increase the number of viewers with high quality experience by at least 75% over the first streaming minutes.
arXiv Detail & Related papers (2021-07-28T19:53:05Z) - Modeling High-order Interactions across Multi-interests for Micro-video
Reommendation [65.16624625748068]
We propose a Self-over-Co Attention module to enhance user's interest representation.
In particular, we first use co-attention to model correlation patterns across different levels and then use self-attention to model correlation patterns within a specific level.
arXiv Detail & Related papers (2021-04-01T07:20:15Z) - Multi-Interactive Attention Network for Fine-grained Feature Learning in
CTR Prediction [48.267995749975476]
In the Click-Through Rate (CTR) prediction scenario, user's sequential behaviors are well utilized to capture the user interest.
Existing methods mostly utilize attention on the behavior of users, which is not always suitable for CTR prediction.
We propose a Multi-Interactive Attention Network (MIAN) to comprehensively extract the latent relationship among all kinds of fine-grained features.
arXiv Detail & Related papers (2020-12-13T05:46:19Z) - Disentangled Graph Collaborative Filtering [100.26835145396782]
Disentangled Graph Collaborative Filtering (DGCF) is a new model for learning informative representations of users and items from interaction data.
By modeling a distribution over intents for each user-item interaction, we iteratively refine the intent-aware interaction graphs and representations.
DGCF achieves significant improvements over several state-of-the-art models like NGCF, DisenGCN, and MacridVAE.
arXiv Detail & Related papers (2020-07-03T15:37:25Z) - Predicting the Popularity of Micro-videos with Multimodal Variational
Encoder-Decoder Framework [54.194340961353944]
We propose a multimodal variational encoder-decoder framework for micro-video popularity tasks.
MMVED learns a prediction embedding of a micro-video that is informative to its popularity level.
Experiments conducted on a public dataset and a dataset we collect from Xigua demonstrate the effectiveness of the proposed MMVED framework.
arXiv Detail & Related papers (2020-03-28T06:08:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.