Related papers: Multimodal Recommender Systems: A Survey

Multimodal Recommender Systems: A Survey

URL: http://arxiv.org/abs/2302.03883v2
Date: Wed, 4 Sep 2024 14:00:42 GMT
Title: Multimodal Recommender Systems: A Survey
Authors: Qidong Liu, Jiaxi Hu, Yutian Xiao, Xiangyu Zhao, Jingtong Gao, Wanyu Wang, Qing Li, Jiliang Tang,
Abstract summary: Multimodal Recommender System (MRS) has attracted much attention from both academia and industry recently. In this paper, we will give a comprehensive survey of the MRS models, mainly from technical views. To access more details of the surveyed papers, such as implementation code, we open source a repository.
Score: 50.23505070348051
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recommender system (RS) has been an integral toolkit of online services. They are equipped with various deep learning techniques to model user preference based on identifier and attribute information. With the emergence of multimedia services, such as short videos, news and etc., understanding these contents while recommending becomes critical. Besides, multimodal features are also helpful in alleviating the problem of data sparsity in RS. Thus, Multimodal Recommender System (MRS) has attracted much attention from both academia and industry recently. In this paper, we will give a comprehensive survey of the MRS models, mainly from technical views. First, we conclude the general procedures and major challenges for MRS. Then, we introduce the existing MRS models according to four categories, i.e., Modality Encoder, Feature Interaction, Feature Enhancement and Model Optimization. Besides, to make it convenient for those who want to research this field, we also summarize the dataset and code resources. Finally, we discuss some promising future directions of MRS and conclude this paper. To access more details of the surveyed papers, such as implementation code, we open source a repository.

Related papers

A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms [36.88050794621219]
This survey provides a comprehensive overview of the Foundation Models for Recommender Systems (FM4RecSys) We first review the data foundations of RS, from traditional explicit or implicit feedback to multimodal content sources. We then introduce FMs and their capabilities for representation learning, natural language understanding, and multi-modal reasoning in RS contexts.
arXiv Detail & Related papers (2025-04-23T05:02:51Z)
Composed Multi-modal Retrieval: A Survey of Approaches and Applications [17.316062338546544]
Composed Multi-modal Retrieval (CMR) enables users to retrieve images or videos by integrating a reference visual input with textual modifications. CMR is poised to become a pivotal technology in next-generation retrieval systems.
arXiv Detail & Related papers (2025-03-03T09:18:43Z)
A Survey on Multimodal Recommender Systems: Recent Advances and Future Directions [16.652996189513658]
This paper comprehensively reviews recent research advancements in Multimodal Recommender Systems. We introduce the existing MRS models by categorizing them into four key areas: Feature Extraction, Multimodal Fusion, and Loss Function. We hope to contribute to developing a more sophisticated and effective multimodal recommender system.
arXiv Detail & Related papers (2025-01-22T12:00:35Z)
LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation [57.49045064294086]
Large Language Model (LLM) has the ability to capture semantic relationships between items, independent of their popularity. We introduce LLMEmb, a novel method leveraging LLM to generate item embeddings that enhance Sequential Recommender Systems (SRS) performance.
arXiv Detail & Related papers (2024-09-30T03:59:06Z)
MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce [42.3177388371158]
Current Embedding-based Retrieval Systems embed queries and items into a shared low-dimensional space. We propose MRSE, a Multi-modality Retrieval System that integrates text, item images, and user preferences. MRSE achieves an 18.9% improvement in offline relevance and a 3.7% gain in online core metrics compared to Shopee's state-of-the-art uni-modality system.
arXiv Detail & Related papers (2024-08-27T11:21:19Z)
An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models [21.892975397847316]
We present an interactive Multi-modal Query Answering (MQA) system, empowered by our newly developed multi-modal retrieval framework and navigation graph index. One notable aspect of MQA is its utilization of contrastive learning to assess the significance of different modalities. The system achieves efficient retrieval through our advanced navigation graph index, refined using computational pruning techniques.
arXiv Detail & Related papers (2024-07-05T02:01:49Z)
Multimodal Pretraining and Generation for Recommendation: A Tutorial [54.07497722719509]
The tutorial comprises three parts: multimodal pretraining, multimodal generation, and industrial applications. It aims to facilitate a swift understanding of multimodal recommendation and foster meaningful discussions on the future development of this evolving landscape.
arXiv Detail & Related papers (2024-05-11T06:15:22Z)
A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys) [57.30228361181045]
This survey connects key advancements in recommender systems using Generative Models (Gen-RecSys) It covers: interaction-driven generative models; the use of large language models (LLM) and textual data for natural language recommendation; and the integration of multimodal models for generating and processing images/videos in RS. Our work highlights necessary paradigms for evaluating the impact and harm of Gen-RecSys and identifies open challenges.
arXiv Detail & Related papers (2024-03-31T06:57:57Z)
An Empirical Study of Training ID-Agnostic Multi-modal Sequential Recommenders [3.1093882314734285]
Sequential Recommendation (SR) aims to predict future user-item interactions based on historical interactions. While many SR approaches concentrate on user IDs and item IDs, the human perception of the world through multi-modal signals, like text and images, has inspired researchers to delve into constructing SR from multi-modal information without using IDs. This paper introduces a simple and universal textbfMulti-textbfModal textbfSequential textbfRecommendation (textbfMMSR) framework.
arXiv Detail & Related papers (2024-03-26T04:16:57Z)
Mirror Gradient: Towards Robust Multimodal Recommender Systems via Exploring Flat Local Minima [54.06000767038741]
We analyze multimodal recommender systems from the novel perspective of flat local minima. We propose a concise yet effective gradient strategy called Mirror Gradient (MG) We find that the proposed MG can complement existing robust training methods and be easily extended to diverse advanced recommendation models.
arXiv Detail & Related papers (2024-02-17T12:27:30Z)
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework [51.44863255495668]
Multimodal reasoning is a critical component in the pursuit of artificial intelligence systems that exhibit human-like intelligence. We present Multi-Modal Reasoning(COCO-MMR) dataset, a novel dataset that encompasses an extensive collection of open-ended questions. We propose innovative techniques, including multi-hop cross-modal attention and sentence-level contrastive learning, to enhance the image and text encoders.
arXiv Detail & Related papers (2023-07-24T08:58:25Z)
How Can Recommender Systems Benefit from Large Language Models: A Survey [82.06729592294322]
Large language models (LLM) have shown impressive general intelligence and human-like capabilities. We conduct a comprehensive survey on this research direction from the perspective of the whole pipeline in real-world recommender systems.
arXiv Detail & Related papers (2023-06-09T11:31:50Z)
Automated Machine Learning for Deep Recommender Systems: A Survey [25.942427065983754]
This article will give a comprehensive summary of automated machine learning (AutoML) for developing DRS models. We first provide an overview of AutoML for DRS models and the related techniques. Then we discuss the state-of-the-art AutoML approaches that automate the feature selection, feature embeddings, feature interactions, and system design in DRS.
arXiv Detail & Related papers (2022-04-04T11:17:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.