Ranking Across Different Content Types: The Robust Beauty of Multinomial Blending
- URL: http://arxiv.org/abs/2408.09168v1
- Date: Sat, 17 Aug 2024 11:11:31 GMT
- Title: Ranking Across Different Content Types: The Robust Beauty of Multinomial Blending
- Authors: Jan Malte Lichtenberg, Giuseppe Di Benedetto, Matteo Ruffini,
- Abstract summary: Cross-content-type ranking poses a significant challenge for traditional learning-to-rank algorithms.
We explore a simple method for cross-content-type ranking, called multinomial blending (MB)
We report the results of an A/B test from an Amazon Music ranking use-case.
- Score: 1.0650780147044159
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An increasing number of media streaming services have expanded their offerings to include entities of multiple content types. For instance, audio streaming services that started by offering music only, now also offer podcasts, merchandise items, and videos. Ranking items across different content types into a single slate poses a significant challenge for traditional learning-to-rank (LTR) algorithms due to differing user engagement patterns for different content types. We explore a simple method for cross-content-type ranking, called multinomial blending (MB), which can be used in conjunction with most existing LTR algorithms. We compare MB to existing baselines not only in terms of ranking quality but also from other industry-relevant perspectives such as interpretability, ease-of-use, and stability in dynamic environments with changing user behavior and ranking model retraining. Finally, we report the results of an A/B test from an Amazon Music ranking use-case.
Related papers
- Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning [56.873534081386]
A new topic HIREST is presented, including video retrieval, moment retrieval, moment segmentation, and step-captioning.
We propose a query-centric audio-visual cognition network to construct a reliable multi-modal representation for three tasks.
This can cognize user-preferred content and thus attain a query-centric audio-visual representation for three tasks.
arXiv Detail & Related papers (2024-12-18T06:43:06Z) - Learning Multi-Aspect Item Palette: A Semantic Tokenization Framework for Generative Recommendation [55.99632509895994]
We introduce LAMIA, a novel approach for multi-aspect semantic tokenization.<n>Unlike RQ-VAE, which uses a single embedding, LAMIA learns an item palette''--a collection of independent and semantically parallel embeddings.<n>Our results demonstrate significant improvements in recommendation accuracy over existing methods.
arXiv Detail & Related papers (2024-09-11T13:49:48Z) - Cross-view Semantic Alignment for Livestreaming Product Recognition [24.38606354376169]
We present LPR4M, a large-scale multimodal dataset that covers 34 categories.
LPR4M contains diverse videos and noise modality pairs while exhibiting a long-tailed distribution.
A novel Patch Feature Reconstruction loss is proposed to penalize the semantic misalignment between cross-view patches.
arXiv Detail & Related papers (2023-08-09T12:23:41Z) - ContentCTR: Frame-level Live Streaming Click-Through Rate Prediction
with Multimodal Transformer [31.10377461705053]
We propose a ContentCTR model that leverages multimodal transformer for frame-level CTR prediction.
We conduct extensive experiments on both real-world scenarios and public datasets, and our ContentCTR model outperforms traditional recommendation models in capturing real-time content changes.
arXiv Detail & Related papers (2023-06-26T03:04:53Z) - Multi-Content Time-Series Popularity Prediction with Multiple-Model
Transformers in MEC Networks [34.44384973176474]
Coded/uncoded content placement in Mobile Edge Caching (MEC) has evolved to meet the significant growth of global mobile data traffic.
Most existing datadriven popularity prediction models are not suitable for the coded/uncoded content placement frameworks.
We develop a Multiple-model (hybrid) Transformer-based Edge Caching (MTEC) framework with higher generalization ability.
arXiv Detail & Related papers (2022-10-12T02:24:49Z) - Temporal Saliency Query Network for Efficient Video Recognition [82.52760040577864]
Video recognition is a hot-spot research topic with the explosive growth of multimedia data on the Internet and mobile devices.
Most existing methods select the salient frames without awareness of the class-specific saliency scores.
We propose a novel Temporal Saliency Query (TSQ) mechanism, which introduces class-specific information to provide fine-grained cues for saliency measurement.
arXiv Detail & Related papers (2022-07-21T09:23:34Z) - Self-Supervised Representation Learning With MUlti-Segmental
Informational Coding (MUSIC) [6.693379403133435]
Self-supervised representation learning maps high-dimensional data into a meaningful embedding space.
We propose MUlti-Segmental Informational Coding (MUSIC) for self-supervised representation learning.
arXiv Detail & Related papers (2022-06-13T20:37:48Z) - Multi-Modal Few-Shot Object Detection with Meta-Learning-Based
Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection.
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning.
We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z) - Diverse Preference Augmentation with Multiple Domains for Cold-start
Recommendations [92.47380209981348]
We propose a Diverse Preference Augmentation framework with multiple source domains based on meta-learning.
We generate diverse ratings in a new domain of interest to handle overfitting on the case of sparse interactions.
These ratings are introduced into the meta-training procedure to learn a preference meta-learner, which produces good generalization ability.
arXiv Detail & Related papers (2022-04-01T10:10:50Z) - Content-based Analysis of the Cultural Differences between TikTok and
Douyin [95.32409577885645]
Short-form video social media shifts away from the traditional media paradigm by telling the audience a dynamic story to attract their attention.
In particular, different combinations of everyday objects can be employed to represent a unique scene that is both interesting and understandable.
Offered by the same company, TikTok and Douyin are popular examples of such new media that has become popular in recent years.
The hypothesis that they express cultural differences together with media fashion and social idiosyncrasy is the primary target of our research.
arXiv Detail & Related papers (2020-11-03T01:47:49Z) - Generalized Few-Shot Video Classification with Video Retrieval and
Feature Generation [132.82884193921535]
We argue that previous methods underestimate the importance of video feature learning and propose a two-stage approach.
We show that this simple baseline approach outperforms prior few-shot video classification methods by over 20 points on existing benchmarks.
We present two novel approaches that yield further improvement.
arXiv Detail & Related papers (2020-07-09T13:05:32Z) - Video Understanding as Machine Translation [53.59298393079866]
We tackle a wide variety of downstream video understanding tasks by means of a single unified framework.
We report performance gains over the state-of-the-art on several downstream tasks including video classification (EPIC-Kitchens), question answering (TVQA), captioning (TVC, YouCook2, and MSR-VTT)
arXiv Detail & Related papers (2020-06-12T14:07:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.