EGRA:Toward Enhanced Behavior Graphs and Representation Alignment for Multimodal Recommendation
- URL: http://arxiv.org/abs/2508.16170v1
- Date: Fri, 22 Aug 2025 07:47:54 GMT
- Title: EGRA:Toward Enhanced Behavior Graphs and Representation Alignment for Multimodal Recommendation
- Authors: Xiaoxiong Zhang, Xin Zhou, Zhiwei Zeng, Yongjie Wang, Dusit Niyato, Zhiqi Shen,
- Abstract summary: MultiModal Recommendation (MMR) systems have emerged as a promising solution for improving recommendation quality by leveraging rich item-side modality information.<n>We propose EGRA, which incorporates into the behavior graph an item-item graph built from representations generated by a pretrained MMR model.<n>It also introduces a novel bi-level dynamic alignment weighting mechanism to improve modality-behavior representation alignment.
- Score: 50.848374648774374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: MultiModal Recommendation (MMR) systems have emerged as a promising solution for improving recommendation quality by leveraging rich item-side modality information, prompting a surge of diverse methods. Despite these advances, existing methods still face two critical limitations. First, they use raw modality features to construct item-item links for enriching the behavior graph, while giving limited attention to balancing collaborative and modality-aware semantics or mitigating modality noise in the process. Second, they use a uniform alignment weight across all entities and also maintain a fixed alignment strength throughout training, limiting the effectiveness of modality-behavior alignment. To address these challenges, we propose EGRA. First, instead of relying on raw modality features, it alleviates sparsity by incorporating into the behavior graph an item-item graph built from representations generated by a pretrained MMR model. This enables the graph to capture both collaborative patterns and modality aware similarities with enhanced robustness against modality noise. Moreover, it introduces a novel bi-level dynamic alignment weighting mechanism to improve modality-behavior representation alignment, which dynamically assigns alignment strength across entities according to their alignment degree, while gradually increasing the overall alignment intensity throughout training. Extensive experiments on five datasets show that EGRA significantly outperforms recent methods, confirming its effectiveness.
Related papers
- Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection [54.10252086842123]
Multimodal Sentiment Analysis (MSA) aims to predict sentiment from language, acoustic, and visual data in videos.<n>This paper proposes a modality optimization and dynamic primary modality selection framework (MODS)<n>Experiments on four benchmark datasets demonstrate that MODS outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-11-09T11:13:32Z) - Decoding Visual Neural Representations by Multimodal with Dynamic Balancing [8.355081324607537]
We propose an innovative framework that integrates EEG, image, and text data, aiming to decode visual neural representations from low signal-to-noise ratio EEG signals.<n>We introduce text modality to enhance the semantic correspondence between EEG signals and visual content.<n>Our method surpasses previous state-of-the-art methods in both Top-1 and Top-5 accuracy metrics, improving by 2.0% and 4.7% respectively.
arXiv Detail & Related papers (2025-09-03T16:03:59Z) - Semantic Item Graph Enhancement for Multimodal Recommendation [49.66272783945571]
Multimodal recommendation systems have attracted increasing attention for their improved performance by leveraging items' multimodal information.<n>Prior methods often build modality-specific item-item semantic graphs from raw modality features.<n>These semantic graphs suffer from semantic deficiencies, including insufficient modeling of collaborative signals among items.
arXiv Detail & Related papers (2025-08-08T09:20:50Z) - SLIF-MR: Self-loop Iterative Fusion of Heterogeneous Auxiliary Information for Multimodal Recommendation [13.3951304427872]
We propose a novel framework termed Self-loop Iterative Fusion of Heterogeneous Auxiliary Information for Multimodal Recommendation (SLIF-MR)<n>SLIF-MR leverages item representations from previous training epoch as feedback signals to dynamically optimize the heterogeneous graph structures composed of KG, multimodal item feature graph, and user-item interaction graph.<n>Experiments show that SLIF-MR significantly outperforms existing methods, particularly in terms of accuracy and robustness.
arXiv Detail & Related papers (2025-07-14T07:32:16Z) - MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings [75.0617088717528]
MoCa is a framework for transforming pre-trained VLM backbones into effective bidirectional embedding models.<n>MoCa consistently improves performance across MMEB and ViDoRe-v2 benchmarks, achieving new state-of-the-art results.
arXiv Detail & Related papers (2025-06-29T06:41:00Z) - Fast State-Augmented Learning for Wireless Resource Allocation with Dual Variable Regression [83.27791109672927]
We show how a state-augmented graph neural network (GNN) parametrization for the resource allocation policy circumvents the drawbacks of the ubiquitous dual subgradient methods.<n>Lagrangian maximizing state-augmented policies are learned during the offline training phase.<n>We prove a convergence result and an exponential probability bound on the excursions of the dual function (iterate) optimality gaps.
arXiv Detail & Related papers (2025-06-23T15:20:58Z) - Contrastive Matrix Completion with Denoising and Augmented Graph Views for Robust Recommendation [1.0128808054306186]
Matrix completion is a widely adopted framework in recommender systems.<n>We propose a novel method called Matrix Completion using Contrastive Learning (MCCL)<n>Our approach not only improves the numerical accuracy of the predicted scores--but also produces superior rankings with improvements of up to 36% in ranking metrics.
arXiv Detail & Related papers (2025-06-12T12:47:35Z) - GRAMA: Adaptive Graph Autoregressive Moving Average Models [26.755971450887333]
We introduce GRAMA, a Graph Adaptive method based on a learnable Autoregressive Moving Average (ARMA) framework.<n>By transforming from static to sequential graph data, GRAMA enables efficient and flexible long-range information propagation.<n>We also establish theoretical connections between GRAMA and Selective SSMs, providing insights into its ability to capture long-range dependencies.
arXiv Detail & Related papers (2025-01-22T09:09:17Z) - Dynamic Weighted Combiner for Mixed-Modal Image Retrieval [8.683144453481328]
Mixed-Modal Image Retrieval (MMIR) as a flexible search paradigm has attracted wide attention.
Previous approaches always achieve limited performance, due to two critical factors.
We propose a Dynamic Weighted Combiner (DWC) to tackle the above challenges.
arXiv Detail & Related papers (2023-12-11T07:36:45Z) - Scaling Multimodal Pre-Training via Cross-Modality Gradient
Harmonization [68.49738668084693]
Self-supervised pre-training recently demonstrates success on large-scale multimodal data.
Cross-modality alignment (CMA) is only a weak and noisy supervision.
CMA might cause conflicts and biases among modalities.
arXiv Detail & Related papers (2022-11-03T18:12:32Z) - Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person
Re-Identification [208.1227090864602]
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem.
Existing VI-ReID methods tend to learn global representations, which have limited discriminability and weak robustness to noisy images.
We propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID.
arXiv Detail & Related papers (2020-07-18T03:08:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.