Related papers: BiVRec: Bidirectional View-based Multimodal Sequential Recommendation

BiVRec: Bidirectional View-based Multimodal Sequential Recommendation

URL: http://arxiv.org/abs/2402.17334v2
Date: Tue, 5 Mar 2024 04:12:34 GMT
Title: BiVRec: Bidirectional View-based Multimodal Sequential Recommendation
Authors: Jiaxi Hu, Jingtong Gao, Xiangyu Zhao, Yuehong Hu, Yuxuan Liang, Yiqi Wang, Ming He, Zitao Liu, Hongzhi Yin
Abstract summary: We propose an innovative framework, BivRec, that jointly trains the recommendation tasks in both ID and multimodal views. BivRec achieves state-of-the-art performance on five datasets and showcases various practical advantages.
Score: 55.87443627659778
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The integration of multimodal information into sequential recommender systems has attracted significant attention in recent research. In the initial stages of multimodal sequential recommendation models, the mainstream paradigm was ID-dominant recommendations, wherein multimodal information was fused as side information. However, due to their limitations in terms of transferability and information intrusion, another paradigm emerged, wherein multimodal features were employed directly for recommendation, enabling recommendation across datasets. Nonetheless, it overlooked user ID information, resulting in low information utilization and high training costs. To this end, we propose an innovative framework, BivRec, that jointly trains the recommendation tasks in both ID and multimodal views, leveraging their synergistic relationship to enhance recommendation performance bidirectionally. To tackle the information heterogeneity issue, we first construct structured user interest representations and then learn the synergistic relationship between them. Specifically, BivRec comprises three modules: Multi-scale Interest Embedding, comprehensively modeling user interests by expanding user interaction sequences with multi-scale patching; Intra-View Interest Decomposition, constructing highly structured interest representations using carefully designed Gaussian attention and Cluster attention; and Cross-View Interest Learning, learning the synergistic relationship between the two recommendation views through coarse-grained overall semantic similarity and fine-grained interest allocation similarity BiVRec achieves state-of-the-art performance on five datasets and showcases various practical advantages.

Related papers

COHESION: Composite Graph Convolutional Network with Dual-Stage Fusion for Multimodal Recommendation [26.169114011402232]
Two key processes in multimodal recommendations are modality fusion and representation learning. We introduce a COmposite grapH convolutional nEtwork with dual-stage fuSION for the multimodal recommendation, named COHESION.
arXiv Detail & Related papers (2025-04-06T11:42:49Z)
Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation [19.47124940518026]
We propose a Hierarchical time-aware Mixture of experts for multi-modal Sequential Recommendation (HM4SR) First MoE, named Interactive MoE, extracts essential user interest-related information from the multi-modal data of each item. Second MoE, termed Temporal MoE, captures user dynamic interests by introducing explicit temporal embeddings from timestamps in modality encoding.
arXiv Detail & Related papers (2025-01-24T06:26:50Z)
CADMR: Cross-Attention and Disentangled Learning for Multimodal Recommender Systems [0.6037276428689637]
We propose CADMR, a novel autoencoder-based multimodal recommender system framework. We evaluate CADMR on three benchmark datasets, demonstrating significant performance improvements over state-of-the-art methods.
arXiv Detail & Related papers (2024-12-03T09:09:52Z)
LLM-assisted Explicit and Implicit Multi-interest Learning Framework for Sequential Recommendation [50.98046887582194]
We propose an explicit and implicit multi-interest learning framework to model user interests on two levels: behavior and semantics. The proposed EIMF framework effectively and efficiently combines small models with LLM to improve the accuracy of multi-interest modeling.
arXiv Detail & Related papers (2024-11-14T13:00:23Z)
Bundle Recommendation with Item-level Causation-enhanced Multi-view Learning [1.901404011684453]
We present BunCa, a novel bundle recommendation approach employing item-level causation-enhanced multi-view learning. BunCa provides comprehensive representations of users and bundles through two views: the Coherent View and the Cohesive View. Experiments with BunCa on three benchmark datasets demonstrate the effectiveness of this novel research.
arXiv Detail & Related papers (2024-08-13T07:05:27Z)
Towards Bridging the Cross-modal Semantic Gap for Multi-modal Recommendation [12.306686291299146]
Multi-modal recommendation greatly enhances the performance of recommender systems. Most existing multi-modal recommendation models exploit multimedia information propagation processes to enrich item representations. We propose a novel framework to bridge the semantic gap between modalities and extract fine-grained multi-view semantic information.
arXiv Detail & Related papers (2024-07-07T15:56:03Z)
MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation [61.45986275328629]
We propose MISSRec, a multi-modal pre-training and transfer learning framework for sequential recommendation. On the user side, we design a Transformer-based encoder-decoder model, where the contextual encoder learns to capture the sequence-level multi-modal user interests. On the candidate item side, we adopt a dynamic fusion module to produce user-adaptive item representation.
arXiv Detail & Related papers (2023-08-22T04:06:56Z)
Feature Decoupling-Recycling Network for Fast Interactive Segmentation [79.22497777645806]
Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input. We propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies.
arXiv Detail & Related papers (2023-08-07T12:26:34Z)
Multi-View Clustering from the Perspective of Mutual Information [0.0]
We propose a novel model based on information theory termed Informative Multi-View Clustering (IMVC) IMVC extracts the common and view-specific information hidden in multi-view data and constructs a clustering-oriented comprehensive representation. We conduct extensive experiments on six benchmark datasets, and the experimental results indicate that the proposed IMVC outperforms other methods.
arXiv Detail & Related papers (2023-02-17T07:49:27Z)
Dual Representation Learning for One-Step Clustering of Multi-View Data [30.131568561100817]
We propose a novel one-step multi-view clustering method by exploiting the dual representation of both the common and specific information of different views. With this framework, the representation learning and clustering partition mutually benefit each other, which effectively improve the clustering performance.
arXiv Detail & Related papers (2022-08-30T14:20:26Z)
Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning. Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z)
Multiplex Behavioral Relation Learning for Recommendation via Memory Augmented Transformer Network [25.563806871858073]
This work proposes a Memory-Augmented Transformer Networks (MATN) to enable the recommendation with multiplex behavioral relational information. In our MATN framework, we first develop a transformer-based multi-behavior relation encoder, to make the learned interaction representations be reflective of the cross-type behavior relations. A memory attention network is proposed to supercharge MATN capturing the contextual signals of different types of behavior into the category-specific latent embedding space.
arXiv Detail & Related papers (2021-10-08T09:54:43Z)
Collaborative Attention Mechanism for Multi-View Action Recognition [75.33062629093054]
We propose a collaborative attention mechanism (CAM) for solving the multi-view action recognition problem. The proposed CAM detects the attention differences among multi-view, and adaptively integrates frame-level information to benefit each other. Experiments on four action datasets illustrate the proposed CAM achieves better results for each view and also boosts multi-view performance.
arXiv Detail & Related papers (2020-09-14T17:33:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.