A Survey on Multimodal Recommender Systems: Recent Advances and Future Directions
- URL: http://arxiv.org/abs/2502.15711v1
- Date: Wed, 22 Jan 2025 12:00:35 GMT
- Title: A Survey on Multimodal Recommender Systems: Recent Advances and Future Directions
- Authors: Jinfeng Xu, Zheyu Chen, Shuo Yang, Jinze Li, Wei Wang, Xiping Hu, Steven Hoi, Edith Ngai,
- Abstract summary: This paper comprehensively reviews recent research advancements in Multimodal Recommender Systems.<n>We introduce the existing MRS models by categorizing them into four key areas: Feature Extraction, Multimodal Fusion, and Loss Function.<n>We hope to contribute to developing a more sophisticated and effective multimodal recommender system.
- Score: 16.652996189513658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Acquiring valuable data from the rapidly expanding information on the internet has become a significant concern, and recommender systems have emerged as a widely used and effective tool for helping users discover items of interest. The essence of recommender systems lies in their ability to predict users' ratings or preferences for various items and subsequently recommend the most relevant ones based on historical interaction data and publicly available information. With the advent of diverse multimedia services, including text, images, video, and audio, humans can perceive the world through multiple modalities. Consequently, a recommender system capable of understanding and interpreting different modal data can more effectively refer to individual preferences. Multimodal Recommender Systems (MRS) not only capture implicit interaction information across multiple modalities but also have the potential to uncover hidden relationships between these modalities. The primary objective of this survey is to comprehensively review recent research advancements in MRS and to analyze the models from a technical perspective. Specifically, we aim to summarize the general process and main challenges of MRS from a technical perspective. We then introduce the existing MRS models by categorizing them into four key areas: Feature Extraction, Encoder, Multimodal Fusion, and Loss Function. Finally, we further discuss potential future directions for developing and enhancing MRS. This survey serves as a comprehensive guide for researchers and practitioners in MRS field, providing insights into the current state of MRS technology and identifying areas for future research. We hope to contribute to developing a more sophisticated and effective multimodal recommender system. To access more details of this paper, we open source a repository: https://github.com/Jinfeng-Xu/Awesome-Multimodal-Recommender-Systems.
Related papers
- A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms [36.88050794621219]
This survey provides a comprehensive overview of the Foundation Models for Recommender Systems (FM4RecSys)
We first review the data foundations of RS, from traditional explicit or implicit feedback to multimodal content sources.
We then introduce FMs and their capabilities for representation learning, natural language understanding, and multi-modal reasoning in RS contexts.
arXiv Detail & Related papers (2025-04-23T05:02:51Z) - Composed Multi-modal Retrieval: A Survey of Approaches and Applications [17.316062338546544]
Composed Multi-modal Retrieval (CMR) enables users to retrieve images or videos by integrating a reference visual input with textual modifications.<n>CMR is poised to become a pivotal technology in next-generation retrieval systems.
arXiv Detail & Related papers (2025-03-03T09:18:43Z) - Multimodal Pretraining and Generation for Recommendation: A Tutorial [54.07497722719509]
The tutorial comprises three parts: multimodal pretraining, multimodal generation, and industrial applications.
It aims to facilitate a swift understanding of multimodal recommendation and foster meaningful discussions on the future development of this evolving landscape.
arXiv Detail & Related papers (2024-05-11T06:15:22Z) - Mirror Gradient: Towards Robust Multimodal Recommender Systems via
Exploring Flat Local Minima [54.06000767038741]
We analyze multimodal recommender systems from the novel perspective of flat local minima.
We propose a concise yet effective gradient strategy called Mirror Gradient (MG)
We find that the proposed MG can complement existing robust training methods and be easily extended to diverse advanced recommendation models.
arXiv Detail & Related papers (2024-02-17T12:27:30Z) - Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset
and Comprehensive Framework [51.44863255495668]
Multimodal reasoning is a critical component in the pursuit of artificial intelligence systems that exhibit human-like intelligence.
We present Multi-Modal Reasoning(COCO-MMR) dataset, a novel dataset that encompasses an extensive collection of open-ended questions.
We propose innovative techniques, including multi-hop cross-modal attention and sentence-level contrastive learning, to enhance the image and text encoders.
arXiv Detail & Related papers (2023-07-24T08:58:25Z) - Recommender Systems in the Era of Large Language Models (LLMs) [62.0129013439038]
Large Language Models (LLMs) have revolutionized the fields of Natural Language Processing (NLP) and Artificial Intelligence (AI)
We conduct a comprehensive review of LLM-empowered recommender systems from various aspects including Pre-training, Fine-tuning, and Prompting.
arXiv Detail & Related papers (2023-07-05T06:03:40Z) - How Can Recommender Systems Benefit from Large Language Models: A Survey [82.06729592294322]
Large language models (LLM) have shown impressive general intelligence and human-like capabilities.
We conduct a comprehensive survey on this research direction from the perspective of the whole pipeline in real-world recommender systems.
arXiv Detail & Related papers (2023-06-09T11:31:50Z) - Multimodal Recommender Systems: A Survey [50.23505070348051]
Multimodal Recommender System (MRS) has attracted much attention from both academia and industry recently.
In this paper, we will give a comprehensive survey of the MRS models, mainly from technical views.
To access more details of the surveyed papers, such as implementation code, we open source a repository.
arXiv Detail & Related papers (2023-02-08T05:12:54Z) - INODE: Building an End-to-End Data Exploration System in Practice
[Extended Vision] [30.411996388471817]
INODE is an end-to-end data exploration system.
We demonstrate it in three significant use cases in the fields of Cancer Biomarker Reearch, Research and Innovation Policy Making, and Astrophysics.
arXiv Detail & Related papers (2021-04-09T05:04:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.