Multi-modal Adaptive Mixture of Experts for Cold-start Recommendation
- URL: http://arxiv.org/abs/2508.08042v1
- Date: Mon, 11 Aug 2025 14:47:14 GMT
- Title: Multi-modal Adaptive Mixture of Experts for Cold-start Recommendation
- Authors: Van-Khang Nguyen, Duc-Hoang Pham, Huy-Son Nguyen, Cam-Van Thi Nguyen, Hoang-Quynh Le, Duc-Trong Le,
- Abstract summary: MAMEX is a novel framework for multimodal cold-start recommendation.<n>It dynamically leverages latent representation from different modalities.<n>Experiments show MAMEX outperforms state-of-the-art methods in cold-start scenarios.
- Score: 1.9967512860886603
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Recommendation systems have faced significant challenges in cold-start scenarios, where new items with a limited history of interaction need to be effectively recommended to users. Though multimodal data (e.g., images, text, audio, etc.) offer rich information to address this issue, existing approaches often employ simplistic integration methods such as concatenation, average pooling, or fixed weighting schemes, which fail to capture the complex relationships between modalities. Our study proposes a novel Mixture of Experts (MoE) framework for multimodal cold-start recommendation, named MAMEX, which dynamically leverages latent representation from different modalities. MAMEX utilizes modality-specific expert networks and introduces a learnable gating mechanism that adaptively weights the contribution of each modality based on its content characteristics. This approach enables MAMEX to emphasize the most informative modalities for each item while maintaining robustness when certain modalities are less relevant or missing. Extensive experiments on benchmark datasets show that MAMEX outperforms state-of-the-art methods in cold-start scenarios, with superior accuracy and adaptability. For reproducibility, the code has been made available on Github https://github.com/L2R-UET/MAMEX.
Related papers
- CAMMSR: Category-Guided Attentive Mixture of Experts for Multimodal Sequential Recommendation [23.478610632707728]
We propose a Category-guided Attentive Mixture of Experts model for Multimodal Sequential Recommendation.<n>At its core, CAMMSR introduces a category-guided attentive mixture of experts module, which learns specialized item representations from multiple perspectives.<n>Experiments on four public datasets demonstrate that CAMMSR consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2026-03-04T17:39:35Z) - PRISM: Personalized Recommendation via Information Synergy Module [12.797662213207936]
PRISM is a plug-and-play framework for sequential recommendation (SR)<n>It decomposes multimodal information into unique, redundant, and synergistic components.<n>Experiments on four datasets and three SR backbones demonstrate its effectiveness and versatility.
arXiv Detail & Related papers (2026-01-16T02:17:54Z) - Revisiting scalable sequential recommendation with Multi-Embedding Approach and Mixture-of-Experts [15.976682531132676]
We propose Fuxi-MME, a framework that integrates a multi-embedding strategy with a Mixture-of-Experts (MoE) architecture.<n>Specifically, to efficiently capture diverse item characteristics in a decoupled manner, we decompose the conventional single embedding matrix into several lower-dimensional embedding matrices.
arXiv Detail & Related papers (2025-10-29T08:42:15Z) - Single-Branch Network Architectures to Close the Modality Gap in Multimodal Recommendation [10.048652639214275]
We use single-branch neural networks equipped with weight sharing, modality sampling, and contrastive loss to provide accurate recommendations.<n>We compare these networks with multi-branch alternatives and conduct extensive experiments on three datasets.<n>Our results show that single-branch networks achieve competitive performance in warm-start scenarios and are significantly better in missing modality settings.
arXiv Detail & Related papers (2025-09-23T08:58:53Z) - Efficient Multimodal Streaming Recommendation via Expandable Side Mixture-of-Experts [40.79898677069334]
multimodal streaming recommender systems are widely deployed in real-world applications, where user interests shift over time.<n>We propose Expandable Side Mixture-of-Experts (XSMoE), a memory-efficient framework for multimodal streaming recommendation.<n>XSMoE attaches lightweight side-tuning modules to frozen pretrained encoders and incrementally expands them in response to evolving user feedback.
arXiv Detail & Related papers (2025-08-08T04:00:05Z) - I$^3$-MRec: Invariant Learning with Information Bottleneck for Incomplete Modality Recommendation [56.55935146424585]
We introduce textbfI$3$-MRec, which learns with textbfInformation bottleneck principle for textbfIncomplete textbfModality textbfRecommendation.<n>By treating each modality as a distinct semantic environment, I$3$-MRec employs invariant risk minimization (IRM) to learn preference-oriented representations.<n>I$3$-MRec consistently outperforms existing state-of-the-art MRS methods across various modality-missing scenarios
arXiv Detail & Related papers (2025-08-06T09:29:50Z) - FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [50.438552588818]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z) - Gated Multimodal Graph Learning for Personalized Recommendation [9.466822984141086]
Multimodal recommendation has emerged as a promising solution to alleviate the cold-start and sparsity problems in collaborative filtering.<n>We propose RLMultimodalRec, a lightweight and modular recommendation framework that combines graph-based user modeling with adaptive multimodal item encoding.
arXiv Detail & Related papers (2025-05-30T16:57:17Z) - Spectrum-based Modality Representation Fusion Graph Convolutional Network for Multimodal Recommendation [7.627299398469962]
We propose a new Spectrum-based Modality Representation graph recommender.<n>It aims to capture both uni-modal and fusion preferences while simultaneously suppressing modality noise.<n>Experiments on three real-world datasets show the efficacy of our proposed model.
arXiv Detail & Related papers (2024-12-19T15:53:21Z) - DiffMM: Multi-Modal Diffusion Model for Recommendation [19.43775593283657]
We propose a novel multi-modal graph diffusion model for recommendation called DiffMM.
Our framework integrates a modality-aware graph diffusion model with a cross-modal contrastive learning paradigm to improve modality-aware user representation learning.
arXiv Detail & Related papers (2024-06-17T17:35:54Z) - Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - Mirror Gradient: Towards Robust Multimodal Recommender Systems via
Exploring Flat Local Minima [54.06000767038741]
We analyze multimodal recommender systems from the novel perspective of flat local minima.
We propose a concise yet effective gradient strategy called Mirror Gradient (MG)
We find that the proposed MG can complement existing robust training methods and be easily extended to diverse advanced recommendation models.
arXiv Detail & Related papers (2024-02-17T12:27:30Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - Mining Stable Preferences: Adaptive Modality Decorrelation for
Multimedia Recommendation [23.667430143035787]
We propose a novel MOdality DEcorrelating STable learning framework, MODEST for brevity, to learn users' stable preference.
Inspired by sample re-weighting techniques, the proposed method aims to estimate a weight for each item, such that the features from different modalities in the weighted distribution are decorrelated.
Our method could be served as a play-and-plug module for existing multimedia recommendation backbones.
arXiv Detail & Related papers (2023-06-25T09:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.