Multimodal Pretraining and Generation for Recommendation: A Tutorial
- URL: http://arxiv.org/abs/2405.06927v1
- Date: Sat, 11 May 2024 06:15:22 GMT
- Title: Multimodal Pretraining and Generation for Recommendation: A Tutorial
- Authors: Jieming Zhu, Chuhan Wu, Rui Zhang, Zhenhua Dong,
- Abstract summary: The tutorial comprises three parts: multimodal pretraining, multimodal generation, and industrial applications.
It aims to facilitate a swift understanding of multimodal recommendation and foster meaningful discussions on the future development of this evolving landscape.
- Score: 54.07497722719509
- License:
- Abstract: Personalized recommendation stands as a ubiquitous channel for users to explore information or items aligned with their interests. Nevertheless, prevailing recommendation models predominantly rely on unique IDs and categorical features for user-item matching. While this ID-centric approach has witnessed considerable success, it falls short in comprehensively grasping the essence of raw item contents across diverse modalities, such as text, image, audio, and video. This underutilization of multimodal data poses a limitation to recommender systems, particularly in the realm of multimedia services like news, music, and short-video platforms. The recent surge in pretraining and generation techniques presents both opportunities and challenges in the development of multimodal recommender systems. This tutorial seeks to provide a thorough exploration of the latest advancements and future trajectories in multimodal pretraining and generation techniques within the realm of recommender systems. The tutorial comprises three parts: multimodal pretraining, multimodal generation, and industrial applications and open challenges in the field of recommendation. Our target audience encompasses scholars, practitioners, and other parties interested in this domain. By providing a succinct overview of the field, we aspire to facilitate a swift understanding of multimodal recommendation and foster meaningful discussions on the future development of this evolving landscape.
Related papers
- Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond [51.141270065306514]
This tutorial aims to equip researchers, practitioners, and newcomers with the knowledge and skills to leverage multimodal AI.
We will cover the latest multimodal datasets and pretrained models, including those beyond vision and language.
Hands-on laboratories will offer practical experience with state-of-the-art multimodal models.
arXiv Detail & Related papers (2024-10-08T01:41:56Z) - Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation [9.506245109666907]
Multi-faceted features characterizing products and services may influence each customer on online selling platforms differently.
The common multimodal recommendation pipeline involves (i) extracting multimodal features, (ii) refining their high-level representations to suit the recommendation task, and (iv) predicting the user-item score.
This paper settles as the first attempt to offer a large-scale benchmarking for multimodal recommender systems, with a specific focus on multimodal extractors.
arXiv Detail & Related papers (2024-09-24T08:29:10Z) - Multi-Tower Multi-Interest Recommendation with User Representation Repel [0.9867914513513453]
We propose a novel multi-tower multi-interest framework with user representation repel.
Experimental results across multiple large-scale industrial datasets proved the effectiveness and generalizability of our proposed framework.
arXiv Detail & Related papers (2024-03-08T07:36:14Z) - Mirror Gradient: Towards Robust Multimodal Recommender Systems via
Exploring Flat Local Minima [54.06000767038741]
We analyze multimodal recommender systems from the novel perspective of flat local minima.
We propose a concise yet effective gradient strategy called Mirror Gradient (MG)
We find that the proposed MG can complement existing robust training methods and be easily extended to diverse advanced recommendation models.
arXiv Detail & Related papers (2024-02-17T12:27:30Z) - Generative Multi-Modal Knowledge Retrieval with Large Language Models [75.70313858231833]
We propose an innovative end-to-end generative framework for multi-modal knowledge retrieval.
Our framework takes advantage of the fact that large language models (LLMs) can effectively serve as virtual knowledge bases.
We demonstrate significant improvements ranging from 3.0% to 14.6% across all evaluation metrics when compared to strong baselines.
arXiv Detail & Related papers (2024-01-16T08:44:29Z) - Recommender Systems in the Era of Large Language Models (LLMs) [62.0129013439038]
Large Language Models (LLMs) have revolutionized the fields of Natural Language Processing (NLP) and Artificial Intelligence (AI)
We conduct a comprehensive review of LLM-empowered recommender systems from various aspects including Pre-training, Fine-tuning, and Prompting.
arXiv Detail & Related papers (2023-07-05T06:03:40Z) - Multimodal Recommender Systems: A Survey [50.23505070348051]
Multimodal Recommender System (MRS) has attracted much attention from both academia and industry recently.
In this paper, we will give a comprehensive survey of the MRS models, mainly from technical views.
To access more details of the surveyed papers, such as implementation code, we open source a repository.
arXiv Detail & Related papers (2023-02-08T05:12:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.