Multimodal Recommender Systems: A Survey
- URL: http://arxiv.org/abs/2302.03883v2
- Date: Wed, 4 Sep 2024 14:00:42 GMT
- Title: Multimodal Recommender Systems: A Survey
- Authors: Qidong Liu, Jiaxi Hu, Yutian Xiao, Xiangyu Zhao, Jingtong Gao, Wanyu Wang, Qing Li, Jiliang Tang,
- Abstract summary: Multimodal Recommender System (MRS) has attracted much attention from both academia and industry recently.
In this paper, we will give a comprehensive survey of the MRS models, mainly from technical views.
To access more details of the surveyed papers, such as implementation code, we open source a repository.
- Score: 50.23505070348051
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recommender system (RS) has been an integral toolkit of online services. They are equipped with various deep learning techniques to model user preference based on identifier and attribute information. With the emergence of multimedia services, such as short videos, news and etc., understanding these contents while recommending becomes critical. Besides, multimodal features are also helpful in alleviating the problem of data sparsity in RS. Thus, Multimodal Recommender System (MRS) has attracted much attention from both academia and industry recently. In this paper, we will give a comprehensive survey of the MRS models, mainly from technical views. First, we conclude the general procedures and major challenges for MRS. Then, we introduce the existing MRS models according to four categories, i.e., Modality Encoder, Feature Interaction, Feature Enhancement and Model Optimization. Besides, to make it convenient for those who want to research this field, we also summarize the dataset and code resources. Finally, we discuss some promising future directions of MRS and conclude this paper. To access more details of the surveyed papers, such as implementation code, we open source a repository.
Related papers
- MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce [42.3177388371158]
Current Embedding-based Retrieval Systems embed queries and items into a shared low-dimensional space.
We propose MRSE, a Multi-modality Retrieval System that integrates text, item images, and user preferences.
MRSE achieves an 18.9% improvement in offline relevance and a 3.7% gain in online core metrics compared to Shopee's state-of-the-art uni-modality system.
arXiv Detail & Related papers (2024-08-27T11:21:19Z) - An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models [21.892975397847316]
We present an interactive Multi-modal Query Answering (MQA) system, empowered by our newly developed multi-modal retrieval framework and navigation graph index.
One notable aspect of MQA is its utilization of contrastive learning to assess the significance of different modalities.
The system achieves efficient retrieval through our advanced navigation graph index, refined using computational pruning techniques.
arXiv Detail & Related papers (2024-07-05T02:01:49Z) - Multimodal Pretraining and Generation for Recommendation: A Tutorial [54.07497722719509]
The tutorial comprises three parts: multimodal pretraining, multimodal generation, and industrial applications.
It aims to facilitate a swift understanding of multimodal recommendation and foster meaningful discussions on the future development of this evolving landscape.
arXiv Detail & Related papers (2024-05-11T06:15:22Z) - A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys) [57.30228361181045]
This survey connects key advancements in recommender systems using Generative Models (Gen-RecSys)
It covers: interaction-driven generative models; the use of large language models (LLM) and textual data for natural language recommendation; and the integration of multimodal models for generating and processing images/videos in RS.
Our work highlights necessary paradigms for evaluating the impact and harm of Gen-RecSys and identifies open challenges.
arXiv Detail & Related papers (2024-03-31T06:57:57Z) - An Empirical Study of Training ID-Agnostic Multi-modal Sequential Recommenders [3.1093882314734285]
Sequential Recommendation (SR) aims to predict future user-item interactions based on historical interactions.
While many SR approaches concentrate on user IDs and item IDs, the human perception of the world through multi-modal signals, like text and images, has inspired researchers to delve into constructing SR from multi-modal information without using IDs.
This paper introduces a simple and universal textbfMulti-textbfModal textbfSequential textbfRecommendation (textbfMMSR) framework.
arXiv Detail & Related papers (2024-03-26T04:16:57Z) - Mirror Gradient: Towards Robust Multimodal Recommender Systems via
Exploring Flat Local Minima [54.06000767038741]
We analyze multimodal recommender systems from the novel perspective of flat local minima.
We propose a concise yet effective gradient strategy called Mirror Gradient (MG)
We find that the proposed MG can complement existing robust training methods and be easily extended to diverse advanced recommendation models.
arXiv Detail & Related papers (2024-02-17T12:27:30Z) - Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset
and Comprehensive Framework [51.44863255495668]
Multimodal reasoning is a critical component in the pursuit of artificial intelligence systems that exhibit human-like intelligence.
We present Multi-Modal Reasoning(COCO-MMR) dataset, a novel dataset that encompasses an extensive collection of open-ended questions.
We propose innovative techniques, including multi-hop cross-modal attention and sentence-level contrastive learning, to enhance the image and text encoders.
arXiv Detail & Related papers (2023-07-24T08:58:25Z) - How Can Recommender Systems Benefit from Large Language Models: A Survey [82.06729592294322]
Large language models (LLM) have shown impressive general intelligence and human-like capabilities.
We conduct a comprehensive survey on this research direction from the perspective of the whole pipeline in real-world recommender systems.
arXiv Detail & Related papers (2023-06-09T11:31:50Z) - Automated Machine Learning for Deep Recommender Systems: A Survey [25.942427065983754]
This article will give a comprehensive summary of automated machine learning (AutoML) for developing DRS models.
We first provide an overview of AutoML for DRS models and the related techniques.
Then we discuss the state-of-the-art AutoML approaches that automate the feature selection, feature embeddings, feature interactions, and system design in DRS.
arXiv Detail & Related papers (2022-04-04T11:17:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.