Semantic Item Graph Enhancement for Multimodal Recommendation
- URL: http://arxiv.org/abs/2508.06154v1
- Date: Fri, 08 Aug 2025 09:20:50 GMT
- Title: Semantic Item Graph Enhancement for Multimodal Recommendation
- Authors: Xiaoxiong Zhang, Xin Zhou, Zhiwei Zeng, Dusit Niyato, Zhiqi Shen,
- Abstract summary: Multimodal recommendation systems have attracted increasing attention for their improved performance by leveraging items' multimodal information.<n>Prior methods often build modality-specific item-item semantic graphs from raw modality features.<n>These semantic graphs suffer from semantic deficiencies, including insufficient modeling of collaborative signals among items.
- Score: 49.66272783945571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal recommendation systems have attracted increasing attention for their improved performance by leveraging items' multimodal information. Prior methods often build modality-specific item-item semantic graphs from raw modality features and use them as supplementary structures alongside the user-item interaction graph to enhance user preference learning. However, these semantic graphs suffer from semantic deficiencies, including (1) insufficient modeling of collaborative signals among items and (2) structural distortions introduced by noise in raw modality features, ultimately compromising performance. To address these issues, we first extract collaborative signals from the interaction graph and infuse them into each modality-specific item semantic graph to enhance semantic modeling. Then, we design a modulus-based personalized embedding perturbation mechanism that injects perturbations with modulus-guided personalized intensity into embeddings to generate contrastive views. This enables the model to learn noise-robust representations through contrastive learning, thereby reducing the effect of structural noise in semantic graphs. Besides, we propose a dual representation alignment mechanism that first aligns multiple semantic representations via a designed Anchor-based InfoNCE loss using behavior representations as anchors, and then aligns behavior representations with the fused semantics by standard InfoNCE, to ensure representation consistency. Extensive experiments on four benchmark datasets validate the effectiveness of our framework.
Related papers
- IGDMRec: Behavior Conditioned Item Graph Diffusion for Multimodal Recommendation [21.87097387902408]
Multimodal recommender systems (MRSs) are critical for various online platforms, offering users more accurate personalized recommendations by incorporating multimodal information.<n>We propose Item Graph Diffusion for Multimodal Recommendation (IGDMRec), a novel method that leverages a diffusion model with classifier-free guidance to denoise the semantic item graph.<n>Extensive experiments on four real-world datasets demonstrate the superiority of IGDMRec over competitive baselines.
arXiv Detail & Related papers (2025-12-23T02:13:01Z) - Integrating Structure-Aware Attention and Knowledge Graphs in Explainable Recommendation Systems [2.620825811168925]
This paper implements an explainable recommendation model that integrates knowledge graphs with structure-aware attention mechanisms.<n>The model is built on graph neural networks and incorporates a multi-hop neighbor aggregation strategy.<n> Experiments conducted on the Amazon Books dataset validate the superior performance of the proposed model.
arXiv Detail & Related papers (2025-10-11T08:39:34Z) - EGRA:Toward Enhanced Behavior Graphs and Representation Alignment for Multimodal Recommendation [50.848374648774374]
MultiModal Recommendation (MMR) systems have emerged as a promising solution for improving recommendation quality by leveraging rich item-side modality information.<n>We propose EGRA, which incorporates into the behavior graph an item-item graph built from representations generated by a pretrained MMR model.<n>It also introduces a novel bi-level dynamic alignment weighting mechanism to improve modality-behavior representation alignment.
arXiv Detail & Related papers (2025-08-22T07:47:54Z) - Dual-Perspective Disentangled Multi-Intent Alignment for Enhanced Collaborative Filtering [7.031525324133112]
Disentangling user intents from implicit feedback has emerged as a promising strategy for enhancing the accuracy and interpretability of recommendation systems.<n>We propose DMICF, a dual-perspective collaborative filtering framework that unifies intent alignment, structural fusion, and discriminative training.<n>DMICF consistently delivers robust performance across datasets with diverse interaction distributions.
arXiv Detail & Related papers (2025-06-13T07:44:42Z) - Leveraging Foundation Models for Multimodal Graph-Based Action Recognition [1.533133219129073]
We introduce a graph-based framework that integrates a vision-temporal foundation leveraging VideoMAE for dynamic visual encoding and BERT for contextual textual embedding.<n>We show that our method consistently outperforms state-of-the-art baselines on diverse benchmark datasets.
arXiv Detail & Related papers (2025-05-21T07:15:14Z) - BBQRec: Behavior-Bind Quantization for Multi-Modal Sequential Recommendation [15.818669767036592]
We propose a Behavior-Bind multi-modal Quantization for Sequential Recommendation (BBQRec) featuring dual-aligned quantization and semantics-aware sequence modeling.<n>BBQRec disentangles modality-agnostic behavioral patterns from noisy modality-specific features through contrastive codebook learning.<n>We design a discretized similarity reweighting mechanism that dynamically adjusts self-attention scores using quantized semantic relationships.
arXiv Detail & Related papers (2025-04-09T07:19:48Z) - "Principal Components" Enable A New Language of Images [79.45806370905775]
We introduce a novel visual tokenization framework that embeds a provable PCA-like structure into the latent token space.<n>Our approach achieves state-of-the-art reconstruction performance and enables better interpretability to align with the human vision system.
arXiv Detail & Related papers (2025-03-11T17:59:41Z) - Graph with Sequence: Broad-Range Semantic Modeling for Fake News Detection [18.993270952535465]
BREAK is a broad-range semantics model for fake news detection.<n>It leverages a fully connected graph to capture comprehensive semantics.<n>It employs dual denoising modules to minimize both structural and feature noise.
arXiv Detail & Related papers (2024-12-07T14:35:46Z) - An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - DyTed: Disentangled Representation Learning for Discrete-time Dynamic
Graph [59.583555454424]
We propose a novel disenTangled representation learning framework for discrete-time Dynamic graphs, namely DyTed.
We specially design a temporal-clips contrastive learning task together with a structure contrastive learning to effectively identify the time-invariant and time-varying representations respectively.
arXiv Detail & Related papers (2022-10-19T14:34:12Z) - Towards Robust and Adaptive Motion Forecasting: A Causal Representation
Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables.
We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph.
Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z) - Graph Contrastive Learning with Adaptive Augmentation [23.37786673825192]
We propose a novel graph contrastive representation learning method with adaptive augmentation.
Specifically, we design augmentation schemes based on node centrality measures to highlight important connective structures.
Our proposed method consistently outperforms existing state-of-the-art baselines and even surpasses some supervised counterparts.
arXiv Detail & Related papers (2020-10-27T15:12:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.