Multimodal Representation-disentangled Information Bottleneck for Multimodal Recommendation
- URL: http://arxiv.org/abs/2509.20225v1
- Date: Wed, 24 Sep 2025 15:18:32 GMT
- Title: Multimodal Representation-disentangled Information Bottleneck for Multimodal Recommendation
- Authors: Hui Wang, Jinghui Qin, Wushao Wen, Qingling Li, Shanshan Zhong, Zhongzhan Huang,
- Abstract summary: We propose a novel framework, the Multimodal Representation-disentangled Information Bottleneck (MRdIB)<n>Concretely, we first employ a Multimodal Information Bottleneck to compress the input representations.<n>Then, we decompose the information based on its relationship with the recommendation target into unique, redundant, and synergistic components.
- Score: 36.338586087343806
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal data has significantly advanced recommendation systems by integrating diverse information sources to model user preferences and item characteristics. However, these systems often struggle with redundant and irrelevant information, which can degrade performance. Most existing methods either fuse multimodal information directly or use rigid architectural separation for disentanglement, failing to adequately filter noise and model the complex interplay between modalities. To address these challenges, we propose a novel framework, the Multimodal Representation-disentangled Information Bottleneck (MRdIB). Concretely, we first employ a Multimodal Information Bottleneck to compress the input representations, effectively filtering out task-irrelevant noise while preserving rich semantic information. Then, we decompose the information based on its relationship with the recommendation target into unique, redundant, and synergistic components. We achieve this decomposition with a series of constraints: a unique information learning objective to preserve modality-unique signals, a redundant information learning objective to minimize overlap, and a synergistic information learning objective to capture emergent information. By optimizing these objectives, MRdIB guides a model to learn more powerful and disentangled representations. Extensive experiments on several competitive models and three benchmark datasets demonstrate the effectiveness and versatility of our MRdIB in enhancing multimodal recommendation.
Related papers
- Orthogonalized Multimodal Contrastive Learning with Asymmetric Masking for Structured Representations [4.67724003380452]
Multimodal learning seeks to integrate information from heterogeneous sources, where signals may be shared across modalities, specific to individual modalities, or emerge only through their interaction.<n>While self-supervised multimodal contrastive learning has achieved remarkable progress, most existing methods predominantly capture redundant cross-modal signals, often neglecting modality-specific (unique) and interaction-driven (synergistic) information.<n>Recent extensions broaden this perspective, yet they either fail to explicitly model synergistic interactions or learn different information components in an entangled manner, leading to incomplete representations and potential information leakage.<n>We introduce textbfCOrAL, a principled framework
arXiv Detail & Related papers (2026-02-16T18:06:53Z) - From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation [59.27094165576015]
We propose a novel learning paradigm (UniMod) that transitions from sparse decision-making to dense reasoning traces.<n>By constructing structured trajectories encompassing evidence grounding, modality assessment, risk mapping, policy decision, and response generation, we reformulate monolithic decision tasks into a multi-dimensional boundary learning process.<n>We introduce specialized optimization strategies to decouple task-specific parameters and rebalance training dynamics, effectively resolving interference between diverse objectives in multi-task learning.
arXiv Detail & Related papers (2026-01-28T09:29:40Z) - Robust Multimodal Sentiment Analysis via Double Information Bottleneck [55.32835720742616]
Multimodal sentiment analysis has received significant attention across diverse research domains.<n>Existing approaches suffer from insufficient learning of noise-contaminated unimodal data.<n>This paper proposes a Double Information Bottleneck (DIB) strategy to obtain a powerful, unified compact multimodal representation.
arXiv Detail & Related papers (2025-11-03T10:52:45Z) - Structurally Refined Graph Transformer for Multimodal Recommendation [13.296555757708298]
We present SRGFormer, a structurally optimized multimodal recommendation model.<n>By modifying the transformer for better integration into our model, we capture the overall behavior patterns of users.<n>Then, we enhance structural information by embedding multimodal information into a hypergraph structure to aid in learning the local structures between users and items.
arXiv Detail & Related papers (2025-11-01T15:18:00Z) - I$^3$-MRec: Invariant Learning with Information Bottleneck for Incomplete Modality Recommendation [56.55935146424585]
We introduce textbfI$3$-MRec, which learns with textbfInformation bottleneck principle for textbfIncomplete textbfModality textbfRecommendation.<n>By treating each modality as a distinct semantic environment, I$3$-MRec employs invariant risk minimization (IRM) to learn preference-oriented representations.<n>I$3$-MRec consistently outperforms existing state-of-the-art MRS methods across various modality-missing scenarios
arXiv Detail & Related papers (2025-08-06T09:29:50Z) - IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification [60.38841251693781]
We propose a novel framework to generate robust multi-modal object ReIDs.<n>Our framework uses Modal Prefixes and InverseNet to integrate multi-modal information with semantic guidance from inverted text.<n>Experiments on three multi-modal object ReID benchmarks demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2025-03-13T13:00:31Z) - CADMR: Cross-Attention and Disentangled Learning for Multimodal Recommender Systems [0.6037276428689637]
We propose CADMR, a novel autoencoder-based multimodal recommender system framework.<n>We evaluate CADMR on three benchmark datasets, demonstrating significant performance improvements over state-of-the-art methods.
arXiv Detail & Related papers (2024-12-03T09:09:52Z) - Towards Bridging the Cross-modal Semantic Gap for Multi-modal Recommendation [12.306686291299146]
Multi-modal recommendation greatly enhances the performance of recommender systems.
Most existing multi-modal recommendation models exploit multimedia information propagation processes to enrich item representations.
We propose a novel framework to bridge the semantic gap between modalities and extract fine-grained multi-view semantic information.
arXiv Detail & Related papers (2024-07-07T15:56:03Z) - DiffMM: Multi-Modal Diffusion Model for Recommendation [19.43775593283657]
We propose a novel multi-modal graph diffusion model for recommendation called DiffMM.
Our framework integrates a modality-aware graph diffusion model with a cross-modal contrastive learning paradigm to improve modality-aware user representation learning.
arXiv Detail & Related papers (2024-06-17T17:35:54Z) - SynerGraph: An Integrated Graph Convolution Network for Multimodal Recommendation [1.3812010983144802]
This article presents a novel approach to multimodal recommendation systems, focusing on integrating and purifying multimodal data.
We developed a filter to remove noise from various types of data, making the recommendations more reliable.
We studied the impact of top-K sparsification on different datasets, finding optimal values that strike a balance between underfitting and overfitting concerns.
arXiv Detail & Related papers (2024-05-29T12:18:32Z) - BiVRec: Bidirectional View-based Multimodal Sequential Recommendation [55.87443627659778]
We propose an innovative framework, BivRec, that jointly trains the recommendation tasks in both ID and multimodal views.
BivRec achieves state-of-the-art performance on five datasets and showcases various practical advantages.
arXiv Detail & Related papers (2024-02-27T09:10:41Z) - Mirror Gradient: Towards Robust Multimodal Recommender Systems via
Exploring Flat Local Minima [54.06000767038741]
We analyze multimodal recommender systems from the novel perspective of flat local minima.
We propose a concise yet effective gradient strategy called Mirror Gradient (MG)
We find that the proposed MG can complement existing robust training methods and be easily extended to diverse advanced recommendation models.
arXiv Detail & Related papers (2024-02-17T12:27:30Z) - Factorized Contrastive Learning: Going Beyond Multi-view Redundancy [116.25342513407173]
This paper proposes FactorCL, a new multimodal representation learning method to go beyond multi-view redundancy.
On large-scale real-world datasets, FactorCL captures both shared and unique information and achieves state-of-the-art results.
arXiv Detail & Related papers (2023-06-08T15:17:04Z) - Multimodal Information Bottleneck: Learning Minimal Sufficient Unimodal
and Multimodal Representations [27.855467591358018]
We introduce the multimodal information bottleneck (MIB), aiming to learn a powerful and sufficient multimodal representation.
We develop three MIB variants, namely, early-fusion MIB, late-fusion MIB, and complete MIB, to focus on different perspectives of information constraints.
Experimental results suggest that the proposed method reaches state-of-the-art performance on the tasks of multimodal sentiment analysis and multimodal emotion recognition.
arXiv Detail & Related papers (2022-10-31T16:14:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.