Beyond Graph Convolution: Multimodal Recommendation with Topology-aware MLPs
- URL: http://arxiv.org/abs/2412.11747v1
- Date: Mon, 16 Dec 2024 13:05:13 GMT
- Title: Beyond Graph Convolution: Multimodal Recommendation with Topology-aware MLPs
- Authors: Junjie Huang, Jiarui Qin, Yong Yu, Weinan Zhang,
- Abstract summary: multimodal recommender systems need to exploit richer semantic information beyond user-item interactions.
Recent works highlight that leveraging Graph Convolutional Networks (GCNs) to explicitly model multimodal item-item relations can significantly enhance performance.
In this paper, we propose bypassing GCNs when modeling item-item relationship.
- Score: 36.17461150347698
- License:
- Abstract: Given the large volume of side information from different modalities, multimodal recommender systems have become increasingly vital, as they exploit richer semantic information beyond user-item interactions. Recent works highlight that leveraging Graph Convolutional Networks (GCNs) to explicitly model multimodal item-item relations can significantly enhance recommendation performance. However, due to the inherent over-smoothing issue of GCNs, existing models benefit only from shallow GCNs with limited representation power. This drawback is especially pronounced when facing complex and high-dimensional patterns such as multimodal data, as it requires large-capacity models to accommodate complicated correlations. To this end, in this paper, we investigate bypassing GCNs when modeling multimodal item-item relationship. More specifically, we propose a Topology-aware Multi-Layer Perceptron (TMLP), which uses MLPs instead of GCNs to model the relationships between items. TMLP enhances MLPs with topological pruning to denoise item-item relations and intra (inter)-modality learning to integrate higher-order modality correlations. Extensive experiments on three real-world datasets verify TMLP's superiority over nine baselines. We also find that by discarding the internal message passing in GCNs, which is sensitive to node connections, TMLP achieves significant improvements in both training efficiency and robustness against existing models.
Related papers
- Topology-Aware Popularity Debiasing via Simplicial Complexes [19.378410889819165]
Test-time Simplicial Propagation (TSP) incorporates simplicial complexes (SCs) to enhance the expressiveness of Graph Neural Networks (GNNs)
Our approach captures multi-order relationships through SCs, providing a more comprehensive representation of user-item interactions.
Our method produces more uniform distributions of item representations, leading to fairer and more accurate recommendations.
arXiv Detail & Related papers (2024-11-21T07:12:47Z) - Multimodal Graph Neural Network for Recommendation with Dynamic De-redundancy and Modality-Guided Feature De-noisy [8.799657717956343]
We propose Multimodal Graph Neural Network for Recommendation (MGNM) with Dynamic De-redundancy and Modality-Guided Feature De-noisy.
Experimental results demonstrate MGNM achieves superior performance on multimodal information denoising and removal of redundant information.
arXiv Detail & Related papers (2024-11-03T13:23:07Z) - Optimal Transport Guided Correlation Assignment for Multimodal Entity Linking [20.60198596317328]
Multimodal Entity Linking aims to link ambiguous mentions in multimodal contexts to entities in a multimodal knowledge graph.
Existing methods attempt several local correlative mechanisms, relying heavily on the automatically learned attention weights.
We propose a novel MEL framework, namely OT-MEL, with OT-guided correlation assignment.
arXiv Detail & Related papers (2024-06-04T03:35:25Z) - Noise-powered Multi-modal Knowledge Graph Representation Framework [52.95468915728721]
The rise of Multi-modal Pre-training highlights the necessity for a unified Multi-Modal Knowledge Graph representation learning framework.
We propose a novel SNAG method that utilizes a Transformer-based architecture equipped with modality-level noise masking.
Our approach achieves SOTA performance across a total of ten datasets, demonstrating its versatility.
arXiv Detail & Related papers (2024-03-11T15:48:43Z) - Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data.
Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds.
We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z) - Correlation Information Bottleneck: Towards Adapting Pretrained
Multimodal Models for Robust Visual Question Answering [63.87200781247364]
Correlation Information Bottleneck (CIB) seeks a tradeoff between compression and redundancy in representations.
We derive a tight theoretical upper bound for the mutual information between multimodal inputs and representations.
arXiv Detail & Related papers (2022-09-14T22:04:10Z) - GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation
Approach for Multimodal Conversational Emotion Recognition [37.12407597998884]
Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services.
In multimodal ERC, Graph Neural Networks (GNNs) are capable of extracting both long-distance contextual information and inter-modal interactive information.
We present a directed Graph based Cross-modal Feature Complementation (GraphCFC) module that can efficiently model contextual and interactive information.
arXiv Detail & Related papers (2022-07-06T13:56:48Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Multi-Scale Semantics-Guided Neural Networks for Efficient
Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition.
MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z) - Mining Latent Structures for Multimedia Recommendation [46.70109406399858]
We propose a LATent sTructure mining method for multImodal reCommEndation, which we term LATTICE for brevity.
We learn item-item structures for each modality and aggregates multiple modalities to obtain latent item graphs.
Based on the learned latent graphs, we perform graph convolutions to explicitly inject high-order item affinities into item representations.
arXiv Detail & Related papers (2021-04-19T03:50:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.