IGDMRec: Behavior Conditioned Item Graph Diffusion for Multimodal Recommendation
- URL: http://arxiv.org/abs/2512.19983v1
- Date: Tue, 23 Dec 2025 02:13:01 GMT
- Title: IGDMRec: Behavior Conditioned Item Graph Diffusion for Multimodal Recommendation
- Authors: Ziyuan Guo, Jie Guo, Zhenghao Chen, Bin Song, Fei Richard Yu,
- Abstract summary: Multimodal recommender systems (MRSs) are critical for various online platforms, offering users more accurate personalized recommendations by incorporating multimodal information.<n>We propose Item Graph Diffusion for Multimodal Recommendation (IGDMRec), a novel method that leverages a diffusion model with classifier-free guidance to denoise the semantic item graph.<n>Extensive experiments on four real-world datasets demonstrate the superiority of IGDMRec over competitive baselines.
- Score: 21.87097387902408
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal recommender systems (MRSs) are critical for various online platforms, offering users more accurate personalized recommendations by incorporating multimodal information of items. Structure-based MRSs have achieved state-of-the-art performance by constructing semantic item graphs, which explicitly model relationships between items based on modality feature similarity. However, such semantic item graphs are often noisy due to 1) inherent noise in multimodal information and 2) misalignment between item semantics and user-item co-occurrence relationships, which introduces false links and leads to suboptimal recommendations. To address this challenge, we propose Item Graph Diffusion for Multimodal Recommendation (IGDMRec), a novel method that leverages a diffusion model with classifier-free guidance to denoise the semantic item graph by integrating user behavioral information. Specifically, IGDMRec introduces a Behavior-conditioned Graph Diffusion (BGD) module, incorporating interaction data as conditioning information to guide the denoising of the semantic item graph. Additionally, a Conditional Denoising Network (CD-Net) is designed to implement the denoising process with manageable complexity. Finally, we propose a contrastive representation augmentation scheme that leverages both the denoised item graph and the original item graph to enhance item representations. \LL{Extensive experiments on four real-world datasets demonstrate the superiority of IGDMRec over competitive baselines, with robustness analysis validating its denoising capability and ablation studies verifying the effectiveness of its key components.
Related papers
- EGRA:Toward Enhanced Behavior Graphs and Representation Alignment for Multimodal Recommendation [50.848374648774374]
MultiModal Recommendation (MMR) systems have emerged as a promising solution for improving recommendation quality by leveraging rich item-side modality information.<n>We propose EGRA, which incorporates into the behavior graph an item-item graph built from representations generated by a pretrained MMR model.<n>It also introduces a novel bi-level dynamic alignment weighting mechanism to improve modality-behavior representation alignment.
arXiv Detail & Related papers (2025-08-22T07:47:54Z) - Refining Contrastive Learning and Homography Relations for Multi-Modal Recommendation [19.01114538768217]
We propose a novel framework for textbfRtextbfEfining multi-modtextbfAl conttextbfRastive learning and hotextbfMography relations.<n>Our experiments on three real-world datasets demonstrate the superiority of REARM to various state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-19T11:35:48Z) - Semantic Item Graph Enhancement for Multimodal Recommendation [49.66272783945571]
Multimodal recommendation systems have attracted increasing attention for their improved performance by leveraging items' multimodal information.<n>Prior methods often build modality-specific item-item semantic graphs from raw modality features.<n>These semantic graphs suffer from semantic deficiencies, including insufficient modeling of collaborative signals among items.
arXiv Detail & Related papers (2025-08-08T09:20:50Z) - SLIF-MR: Self-loop Iterative Fusion of Heterogeneous Auxiliary Information for Multimodal Recommendation [13.3951304427872]
We propose a novel framework termed Self-loop Iterative Fusion of Heterogeneous Auxiliary Information for Multimodal Recommendation (SLIF-MR)<n>SLIF-MR leverages item representations from previous training epoch as feedback signals to dynamically optimize the heterogeneous graph structures composed of KG, multimodal item feature graph, and user-item interaction graph.<n>Experiments show that SLIF-MR significantly outperforms existing methods, particularly in terms of accuracy and robustness.
arXiv Detail & Related papers (2025-07-14T07:32:16Z) - Graph-based Diffusion Model for Collaborative Filtering [10.654721251152187]
We propose a Graph-based Diffusion Model for Collaborative Filtering (GDMCF)<n>GDMCF consistently outperforms state-of-the-art methods, highlighting its effectiveness in capturing higher-order collaborative signals.
arXiv Detail & Related papers (2025-04-07T12:51:18Z) - Scalable Weibull Graph Attention Autoencoder for Modeling Document Networks [50.42343781348247]
We develop a graph Poisson factor analysis (GPFA) which provides analytic conditional posteriors to improve the inference accuracy.
We also extend GPFA to a multi-stochastic-layer version named graph Poisson gamma belief network (GPGBN) to capture the hierarchical document relationships at multiple semantic levels.
Our models can extract high-quality hierarchical latent document representations and achieve promising performance on various graph analytic tasks.
arXiv Detail & Related papers (2024-10-13T02:22:14Z) - Enhancing Graph Contrastive Learning with Reliable and Informative Augmentation for Recommendation [84.45144851024257]
We propose a novel framework that aims to enhance graph contrastive learning by constructing contrastive views with stronger collaborative information via discrete codes.<n>The core idea is to map users and items into discrete codes rich in collaborative information for reliable and informative contrastive view generation.
arXiv Detail & Related papers (2024-09-09T14:04:17Z) - Relation Modeling and Distillation for Learning with Noisy Labels [4.556974104115929]
This paper proposes a relation modeling and distillation framework that models inter-sample relationships via self-supervised learning.
The proposed framework can learn discriminative representations for noisy data, which results in superior performance than the existing methods.
arXiv Detail & Related papers (2024-05-30T01:47:27Z) - Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks.
We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z) - Pre-training Graph Transformer with Multimodal Side Information for
Recommendation [82.4194024706817]
We propose a pre-training strategy to learn item representations by considering both item side information and their relationships.
We develop a novel sampling algorithm named MCNSampling to select contextual neighbors for each item.
The proposed Pre-trained Multimodal Graph Transformer (PMGT) learns item representations with two objectives: 1) graph structure reconstruction, and 2) masked node feature reconstruction.
arXiv Detail & Related papers (2020-10-23T10:30:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.