Related papers: Knowledge-aware Diffusion-Enhanced Multimedia Recommendation

Knowledge-aware Diffusion-Enhanced Multimedia Recommendation

URL: http://arxiv.org/abs/2507.16396v1
Date: Tue, 22 Jul 2025 09:47:56 GMT
Title: Knowledge-aware Diffusion-Enhanced Multimedia Recommendation
Authors: Xian Mo, Fei Liu, Rui Tang, Jintao, Gao, Hao Liu,
Abstract summary: We propose a Knowledge-aware Diffusion-Enhanced architecture using contrastive learning paradigms (KDiffE) for multimedia recommendations.<n>We first utilize original user-item graphs to build an attention-aware matrix into graph neural networks.<n>Then, we propose a guided diffusion model to generate strongly task-relevant knowledge graphs with less noise for constructing a knowledge-aware contrastive view.
Score: 9.12236232752614
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimedia recommendations aim to use rich multimedia content to enhance historical user-item interaction information, which can not only indicate the content relatedness among items but also reveal finer-grained preferences of users. In this paper, we propose a Knowledge-aware Diffusion-Enhanced architecture using contrastive learning paradigms (KDiffE) for multimedia recommendations. Specifically, we first utilize original user-item graphs to build an attention-aware matrix into graph neural networks, which can learn the importance between users and items for main view construction. The attention-aware matrix is constructed by adopting a random walk with a restart strategy, which can preserve the importance between users and items to generate aggregation of attention-aware node features. Then, we propose a guided diffusion model to generate strongly task-relevant knowledge graphs with less noise for constructing a knowledge-aware contrastive view, which utilizes user embeddings with an edge connected to an item to guide the generation of strongly task-relevant knowledge graphs for enhancing the item's semantic information. We perform comprehensive experiments on three multimedia datasets that reveal the effectiveness of our KDiffE and its components on various state-of-the-art methods. Our source codes are available https://github.com/1453216158/KDiffE.

Related papers

Knowledge graph-based personalized multimodal recommendation fusion framework [8.468510273008393]
Cross-Graph Cross-Modal Mutual Information-Driven Unified Knowledge Graph Learning and Recommendation Framework (CrossGMMI-DUKGLR)<n>This paper reviews existing knowledge graph recommendation frameworks, identifying shortcomings in modal interaction and higher-order dependency modeling.<n>We propose the Cross-Graph Cross-Modal Mutual Information-Driven Unified Knowledge Graph Learning and Recommendation Framework (CrossGMMI-DUKGLR)
arXiv Detail & Related papers (2025-09-03T02:17:28Z)
Less is More: Information Bottleneck Denoised Multimedia Recommendation [43.66791467993419]
We propose a denoised multimedia recommendation paradigm via the Information Bottleneck principle (IB)<n>IBMRec removes task-irrelevant features from both feature and item-item structure perspectives.<n>It is achieved by maximizing the mutual information between multimedia representation and recommendation tasks.
arXiv Detail & Related papers (2025-01-21T14:33:07Z)
Enhancing Graph Contrastive Learning with Reliable and Informative Augmentation for Recommendation [84.45144851024257]
We propose a novel framework that aims to enhance graph contrastive learning by constructing contrastive views with stronger collaborative information via discrete codes.<n>The core idea is to map users and items into discrete codes rich in collaborative information for reliable and informative contrastive view generation.
arXiv Detail & Related papers (2024-09-09T14:04:17Z)
Augmented Commonsense Knowledge for Remote Object Grounding [67.30864498454805]
We propose an augmented commonsense knowledge model (ACK) to leverage commonsense information as atemporal knowledge graph for improving agent navigation. ACK consists of knowledge graph-aware cross-modal and concept aggregation modules to enhance visual representation and visual-textual data alignment. We add a new pipeline for the commonsense-based decision-making process which leads to more accurate local action prediction.
arXiv Detail & Related papers (2024-06-03T12:12:33Z)
Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation [51.80447197290866]
Multi-modal knowledge graph completion (MMKGC) aims to discover unobserved knowledge from given knowledge graphs.<n>Existing MMKGC methods usually extract multi-modal features with pre-trained models.<n>We introduce a novel framework MyGO to tokenize, fuse, and augment the fine-grained multi-modal representations of entities.
arXiv Detail & Related papers (2024-04-15T05:40:41Z)
Intent-aware Multi-source Contrastive Alignment for Tag-enhanced Recommendation [46.04494053005958]
We seek an alternative framework that is light and effective through self-supervised learning across different sources of information. We use a self-supervision signal to pair users with the auxiliary information associated with the items they have interacted with before. We show that our method can achieve better performance while requiring less training time.
arXiv Detail & Related papers (2022-11-11T17:43:19Z)
Conditional Attention Networks for Distilling Knowledge Graphs in Recommendation [74.14009444678031]
We propose Knowledge-aware Conditional Attention Networks (KCAN) to incorporate knowledge graph into a recommender system. We use a knowledge-aware attention propagation manner to obtain the node representation first, which captures the global semantic similarity on the user-item network and the knowledge graph. Then, by applying a conditional attention aggregation on the subgraph, we refine the knowledge graph to obtain target-specific node representations.
arXiv Detail & Related papers (2021-11-03T09:40:43Z)
Latent Structures Mining with Contrastive Modality Fusion for Multimedia Recommendation [22.701371886522494]
We argue that the latent semantic item-item structures underlying multimodal contents could be beneficial for learning better item representations. We devise a novel modality-aware structure learning module, which learns item-item relationships for each modality.
arXiv Detail & Related papers (2021-11-01T03:37:02Z)
Deep Contrastive Learning for Multi-View Network Embedding [20.035449838566503]
Multi-view network embedding aims at projecting nodes in the network to low-dimensional vectors. Most contrastive learning-based methods mostly rely on high-quality graph embedding. We design a novel node-to-node Contrastive learning framework for Multi-view network Embedding (CREME)
arXiv Detail & Related papers (2021-08-16T06:29:18Z)
Mining Latent Structures for Multimedia Recommendation [46.70109406399858]
We propose a LATent sTructure mining method for multImodal reCommEndation, which we term LATTICE for brevity. We learn item-item structures for each modality and aggregates multiple modalities to obtain latent item graphs. Based on the learned latent graphs, we perform graph convolutions to explicitly inject high-order item affinities into item representations.
arXiv Detail & Related papers (2021-04-19T03:50:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.