Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches and Insights
- URL: http://arxiv.org/abs/2407.19467v1
- Date: Sun, 28 Jul 2024 11:36:47 GMT
- Title: Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches and Insights
- Authors: Xiang-Rong Sheng, Feifan Yang, Litong Gong, Biao Wang, Zhangming Chan, Yujing Zhang, Yueyao Cheng, Yong-Nan Zhu, Tiezheng Ge, Han Zhu, Yuning Jiang, Jian Xu, Bo Zheng,
- Abstract summary: We explore approaches to leverage multimodal data to enhance the recommendation accuracy.
We introduce a two-phase framework, including the pre-training of multimodal representations and the integration of these representations with existing ID-based models.
Since the integration of multimodal representations in mid-2023, we have observed significant performance improvements in Taobao display advertising system.
- Score: 38.59216578324812
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the recognized potential of multimodal data to improve model accuracy, many large-scale industrial recommendation systems, including Taobao display advertising system, predominantly depend on sparse ID features in their models. In this work, we explore approaches to leverage multimodal data to enhance the recommendation accuracy. We start from identifying the key challenges in adopting multimodal data in a manner that is both effective and cost-efficient for industrial systems. To address these challenges, we introduce a two-phase framework, including: 1) the pre-training of multimodal representations to capture semantic similarity, and 2) the integration of these representations with existing ID-based models. Furthermore, we detail the architecture of our production system, which is designed to facilitate the deployment of multimodal representations. Since the integration of multimodal representations in mid-2023, we have observed significant performance improvements in Taobao display advertising system. We believe that the insights we have gathered will serve as a valuable resource for practitioners seeking to leverage multimodal data in their systems.
Related papers
- Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond [48.43910061720815]
Multi-modal generative AI has received increasing attention in both academia and industry.
One natural question arises: Is it possible to have a unified model for both understanding and generation?
arXiv Detail & Related papers (2024-09-23T13:16:09Z) - MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct [148.39859547619156]
We propose MMEvol, a novel multimodal instruction data evolution framework.
MMEvol iteratively improves data quality through a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution.
Our approach reaches state-of-the-art (SOTA) performance in nine tasks using significantly less data compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-09T17:44:00Z) - A Unified Graph Transformer for Overcoming Isolations in Multi-modal Recommendation [9.720586396359906]
We argue that existing multi-modal recommender systems typically use isolated processes for both feature extraction and modality modelling.
We propose a novel model, called Unified Multi-modal Graph Transformer (UGT), which leverages a multi-way transformer to extract aligned multi-modal features.
We show that the UGT model can achieve significant effectiveness gains, especially when jointly optimised with the commonly-used multi-modal recommendation losses.
arXiv Detail & Related papers (2024-07-29T11:04:31Z) - SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation [61.392147185793476]
We present a unified and versatile foundation model, namely, SEED-X.
SEED-X is able to model multi-granularity visual semantics for comprehension and generation tasks.
We hope that our work will inspire future research into what can be achieved by versatile multimodal foundation models in real-world applications.
arXiv Detail & Related papers (2024-04-22T17:56:09Z) - Multi-Tower Multi-Interest Recommendation with User Representation Repel [0.9867914513513453]
We propose a novel multi-tower multi-interest framework with user representation repel.
Experimental results across multiple large-scale industrial datasets proved the effectiveness and generalizability of our proposed framework.
arXiv Detail & Related papers (2024-03-08T07:36:14Z) - Generative Multimodal Models are In-Context Learners [60.50927925426832]
We introduce Emu2, a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences.
Emu2 exhibits strong multimodal in-context learning abilities, even emerging to solve tasks that require on-the-fly reasoning.
arXiv Detail & Related papers (2023-12-20T18:59:58Z) - Quantifying & Modeling Multimodal Interactions: An Information
Decomposition Framework [89.8609061423685]
We propose an information-theoretic approach to quantify the degree of redundancy, uniqueness, and synergy relating input modalities with an output task.
To validate PID estimation, we conduct extensive experiments on both synthetic datasets where the PID is known and on large-scale multimodal benchmarks.
We demonstrate their usefulness in (1) quantifying interactions within multimodal datasets, (2) quantifying interactions captured by multimodal models, (3) principled approaches for model selection, and (4) three real-world case studies.
arXiv Detail & Related papers (2023-02-23T18:59:05Z) - Multimodal Contrastive Learning via Uni-Modal Coding and Cross-Modal
Prediction for Multimodal Sentiment Analysis [19.07020276666615]
We propose a novel framework named MultiModal Contrastive Learning (MMCL) for multimodal representation to capture intra- and inter-modality dynamics simultaneously.
We also design two contrastive learning tasks, instance- and sentiment-based contrastive learning, to promote the process of prediction and learn more interactive information related to sentiment.
arXiv Detail & Related papers (2022-10-26T08:24:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.