Improving Multi-modal Recommender Systems by Denoising and Aligning Multi-modal Content and User Feedback
- URL: http://arxiv.org/abs/2406.12501v2
- Date: Wed, 02 Apr 2025 06:51:31 GMT
- Title: Improving Multi-modal Recommender Systems by Denoising and Aligning Multi-modal Content and User Feedback
- Authors: Guipeng Xv, Xinyu Li, Ruobing Xie, Chen Lin, Chong Liu, Feng Xia, Zhanhui Kang, Leyu Lin,
- Abstract summary: We propose Denoising and Aligning Multi-modal Recommender System (DA-MRS)<n>To mitigate multi-modal noise, DA-MRS first constructs item-item graphs determined by consistent content similarity across modalities.<n>To denoise user feedback, DA-MRS associates the probability of observed feedback with multi-modal content and devises a denoised BPR loss.
- Score: 32.10029754890383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal recommender systems (MRSs) are pivotal in diverse online web platforms and have garnered considerable attention in recent years. However, previous studies overlook the challenges of (1) noisy multi-modal content, (2) noisy user feedback, and (3) aligning multi-modal content with user feedback. In order to tackle these challenges, we propose Denoising and Aligning Multi-modal Recommender System (DA-MRS). To mitigate multi-modal noise, DA-MRS first constructs item-item graphs determined by consistent content similarity across modalities. To denoise user feedback, DA-MRS associates the probability of observed feedback with multi-modal content and devises a denoised BPR loss. Furthermore, DA-MRS implements Alignment guided by User preference to enhance task-specific item representation and Alignment guided by graded Item relations to provide finer-grained alignment. Extensive experiments verify that DA-MRS is a plug-and-play framework and achieves significant and consistent improvements across various datasets, backbone models, and noisy scenarios.
Related papers
- Teach Me How to Denoise: A Universal Framework for Denoising Multi-modal Recommender Systems via Guided Calibration [3.6854332833964745]
We propose a universal guided in-sync distillation denoising framework for multi-modal recommendation (GUIDER)
Specifically, GUIDER uses a re-calibration strategy to identify clean and noisy interactions from modal content.
It incorporates a Denoising Bayesian Personalized Ranking (DBPR) loss function to handle implicit user feedback.
arXiv Detail & Related papers (2025-04-19T07:37:03Z) - Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts [56.30364248231053]
This paper introduces Multi-Modal Retrieval-Augmented Generation (M2RAG)
M2RAG is a benchmark designed to evaluate the effectiveness of Multi-modal Large Language Models (MLLMs)
To enhance the context utilization capabilities of MLLMs, we also introduce Multi-Modal Retrieval-Augmented Instruction Tuning (MM-RAIT)
arXiv Detail & Related papers (2025-02-24T16:25:25Z) - Multimodal Difference Learning for Sequential Recommendation [5.243083216855681]
We argue that user interests and item relationships vary across different modalities.
We propose a novel Multimodal Learning framework for Sequential Recommendation, MDSRec.
Results on five real-world datasets demonstrate the superiority of MDSRec over state-of-the-art baselines.
arXiv Detail & Related papers (2024-12-11T05:08:19Z) - CADMR: Cross-Attention and Disentangled Learning for Multimodal Recommender Systems [0.6037276428689637]
We propose CADMR, a novel autoencoder-based multimodal recommender system framework.
We evaluate CADMR on three benchmark datasets, demonstrating significant performance improvements over state-of-the-art methods.
arXiv Detail & Related papers (2024-12-03T09:09:52Z) - Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation [9.506245109666907]
Multi-faceted features characterizing products and services may influence each customer on online selling platforms differently.
The common multimodal recommendation pipeline involves (i) extracting multimodal features, (ii) refining their high-level representations to suit the recommendation task, and (iv) predicting the user-item score.
This paper settles as the first attempt to offer a large-scale benchmarking for multimodal recommender systems, with a specific focus on multimodal extractors.
arXiv Detail & Related papers (2024-09-24T08:29:10Z) - DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems [9.433227503973077]
We propose a novel framework designed to enable fine-grained control over diversity in LLM-based recommendations.
Unlike traditional methods, DLCRec adopts a fine-grained task decomposition strategy, breaking down the recommendation process into three sub-tasks.
We introduce two data augmentation techniques that enhance the model's robustness to noisy and out-of-distribution data.
arXiv Detail & Related papers (2024-08-22T15:10:56Z) - DimeRec: A Unified Framework for Enhanced Sequential Recommendation via Generative Diffusion Models [39.49215596285211]
Sequential Recommendation (SR) plays a pivotal role in recommender systems by tailoring recommendations to user preferences based on their non-stationary historical interactions.
We propose a novel framework called DimeRec that combines a guidance extraction module (GEM) and a generative diffusion aggregation module (DAM)
Our numerical experiments demonstrate that DimeRec significantly outperforms established baseline methods across three publicly available datasets.
arXiv Detail & Related papers (2024-08-22T06:42:09Z) - A Framework for Fine-Tuning LLMs using Heterogeneous Feedback [69.51729152929413]
We present a framework for fine-tuning large language models (LLMs) using heterogeneous feedback.
First, we combine the heterogeneous feedback data into a single supervision format, compatible with methods like SFT and RLHF.
Next, given this unified feedback dataset, we extract a high-quality and diverse subset to obtain performance increases.
arXiv Detail & Related papers (2024-08-05T23:20:32Z) - NoteLLM-2: Multimodal Large Representation Models for Recommendation [71.87790090964734]
Large Language Models (LLMs) have demonstrated exceptional proficiency in text understanding and embedding tasks.
Their potential in multimodal representation, particularly for item-to-item (I2I) recommendations, remains underexplored.
We propose an end-to-end fine-tuning method that customizes the integration of any existing LLMs and vision encoders for efficient multimodal representation.
arXiv Detail & Related papers (2024-05-27T03:24:01Z) - TruthSR: Trustworthy Sequential Recommender Systems via User-generated Multimodal Content [21.90660366765994]
We propose a trustworthy sequential recommendation method via noisy user-generated multi-modal content.
Specifically, we capture the consistency and complementarity of user-generated multi-modal content to mitigate noise interference.
In addition, we design a trustworthy decision mechanism that integrates subjective user perspective and objective item perspective.
arXiv Detail & Related papers (2024-04-26T08:23:36Z) - Multimodal Instruction Tuning with Conditional Mixture of LoRA [54.65520214291653]
This paper introduces a novel approach that integrates multimodal instruction tuning with Low-Rank Adaption (LoRA)
It innovates upon LoRA by dynamically constructing low-rank adaptation matrices tailored to the unique demands of each input instance.
Experimental results on various multimodal evaluation datasets indicate that MixLoRA not only outperforms the conventional LoRA with the same or even higher ranks.
arXiv Detail & Related papers (2024-02-24T20:15:31Z) - Mirror Gradient: Towards Robust Multimodal Recommender Systems via
Exploring Flat Local Minima [54.06000767038741]
We analyze multimodal recommender systems from the novel perspective of flat local minima.
We propose a concise yet effective gradient strategy called Mirror Gradient (MG)
We find that the proposed MG can complement existing robust training methods and be easily extended to diverse advanced recommendation models.
arXiv Detail & Related papers (2024-02-17T12:27:30Z) - Align and Attend: Multimodal Summarization with Dual Contrastive Losses [57.83012574678091]
The goal of multimodal summarization is to extract the most important information from different modalities to form output summaries.
Existing methods fail to leverage the temporal correspondence between different modalities and ignore the intrinsic correlation between different samples.
We introduce Align and Attend Multimodal Summarization (A2Summ), a unified multimodal transformer-based model which can effectively align and attend the multimodal input.
arXiv Detail & Related papers (2023-03-13T17:01:42Z) - Multimodal Recommender Systems: A Survey [50.23505070348051]
Multimodal Recommender System (MRS) has attracted much attention from both academia and industry recently.
In this paper, we will give a comprehensive survey of the MRS models, mainly from technical views.
To access more details of the surveyed papers, such as implementation code, we open source a repository.
arXiv Detail & Related papers (2023-02-08T05:12:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.