Related papers: Hypercomplex Prompt-aware Multimodal Recommendation

Hypercomplex Prompt-aware Multimodal Recommendation

URL: http://arxiv.org/abs/2508.10753v1
Date: Thu, 14 Aug 2025 15:36:00 GMT
Title: Hypercomplex Prompt-aware Multimodal Recommendation
Authors: Zheyu Chen, Jinfeng Xu, Hewei Wang, Shuo Yang, Zitong Wan, Haibo Hu,
Abstract summary: We propose HPMRec, a novel Hypercomplex Prompt-aware Multimodal Recommendation framework.<n>We show that HPMRec achieves state-of-the-art recommendation performance in experiments on four public datasets.
Score: 6.862998546677475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern recommender systems face critical challenges in handling information overload while addressing the inherent limitations of multimodal representation learning. Existing methods suffer from three fundamental limitations: (1) restricted ability to represent rich multimodal features through a single representation, (2) existing linear modality fusion strategies ignore the deep nonlinear correlations between modalities, and (3) static optimization methods failing to dynamically mitigate the over-smoothing problem in graph convolutional network (GCN). To overcome these limitations, we propose HPMRec, a novel Hypercomplex Prompt-aware Multimodal Recommendation framework, which utilizes hypercomplex embeddings in the form of multi-components to enhance the representation diversity of multimodal features. HPMRec adopts the hypercomplex multiplication to naturally establish nonlinear cross-modality interactions to bridge semantic gaps, which is beneficial to explore the cross-modality features. HPMRec also introduces the prompt-aware compensation mechanism to aid the misalignment between components and modality-specific features loss, and this mechanism fundamentally alleviates the over-smoothing problem. It further designs self-supervised learning tasks that enhance representation diversity and align different modalities. Extensive experiments on four public datasets show that HPMRec achieves state-of-the-art recommendation performance.

Related papers

CLEAR: Null-Space Projection for Cross-Modal De-Redundancy in Multimodal Recommendation [22.71702128773632]
Multimodal recommendation has emerged as an effective paradigm for enhancing collaborative filtering by incorporating heterogeneous content modalities.<n>We propose CLEAR, a cross-modal de-redundancy approach for multimodal recommendation.<n> CLEAR reshapes the representation space by suppressing redundant cross-modal components while preserving modality-specific information.
arXiv Detail & Related papers (2026-03-02T07:06:56Z)
Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems [17.3780399150554]
This paper proposes an end-to-end centralized decision-making framework based on sequence-to-sequence, named Multi-Agent Pointer Transformer (MAPT)<n>MAPT significantly outperforms existing baseline methods in terms of performance and substantial computational time advantages compared to classical operations research methods.
arXiv Detail & Related papers (2025-11-21T17:32:10Z)
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching [64.10695425442164]
We introduce NExT-OMNI, an open-source omnimodal foundation model that achieves unified modeling through discrete flow paradigms.<n>Trained on large-scale interleaved text, image, video, and audio data, NExT-OMNI delivers competitive performance on multimodal generation and understanding benchmarks.<n>To advance further research, we release training details, data protocols, and open-source both the code and model checkpoints.
arXiv Detail & Related papers (2025-10-15T16:25:18Z)
I$^3$-MRec: Invariant Learning with Information Bottleneck for Incomplete Modality Recommendation [56.55935146424585]
We introduce textbfI$3$-MRec, which learns with textbfInformation bottleneck principle for textbfIncomplete textbfModality textbfRecommendation.<n>By treating each modality as a distinct semantic environment, I$3$-MRec employs invariant risk minimization (IRM) to learn preference-oriented representations.<n>I$3$-MRec consistently outperforms existing state-of-the-art MRS methods across various modality-missing scenarios
arXiv Detail & Related papers (2025-08-06T09:29:50Z)
Principled Multimodal Representation Learning [70.60542106731813]
Multimodal representation learning seeks to create a unified representation space by integrating diverse data modalities.<n>Recent advances have investigated the simultaneous alignment of multiple modalities, yet several challenges remain.<n>We propose Principled Multimodal Representation Learning (PMRL), a novel framework that achieves simultaneous alignment of multiple modalities.
arXiv Detail & Related papers (2025-07-23T09:12:25Z)
FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [50.438552588818]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z)
MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings [75.0617088717528]
MoCa is a framework for transforming pre-trained VLM backbones into effective bidirectional embedding models.<n>MoCa consistently improves performance across MMEB and ViDoRe-v2 benchmarks, achieving new state-of-the-art results.
arXiv Detail & Related papers (2025-06-29T06:41:00Z)
Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection [0.0]
Multi-modal learning has emerged as a crucial research direction.<n>Existing approaches often suffer from insufficient cross-modal interactions and rigid fusion strategies.<n>We propose Co-AttenDWG, co-attention with dimension-wise gating, and expert fusion.<n>We show that Co-AttenDWG achieves state-of-the-art performance and superior cross-modal alignment.
arXiv Detail & Related papers (2025-05-25T07:26:00Z)
Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding. We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL. UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
Adaptive Contrastive Learning on Multimodal Transformer for Review Helpfulness Predictions [40.70793282367128]
We propose Multimodal Contrastive Learning for Multimodal Review Helpfulness Prediction (MRHP) problem. In addition, we introduce Adaptive Weighting scheme for our contrastive learning approach. Finally, we propose Multimodal Interaction module to address the unalignment nature of multimodal data.
arXiv Detail & Related papers (2022-11-07T13:05:56Z)
Efficient Multimodal Transformer with Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis [47.29528724322795]
Multimodal Sentiment Analysis (MSA) has attracted increasing attention recently. Despite significant progress, there are still two major challenges on the way towards robust MSA. We propose a generic and unified framework to address them, named Efficient Multimodal Transformer with Dual-Level Feature Restoration (EMT-DLFR)
arXiv Detail & Related papers (2022-08-16T08:02:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.