BiVRec: Bidirectional View-based Multimodal Sequential Recommendation
- URL: http://arxiv.org/abs/2402.17334v2
- Date: Tue, 5 Mar 2024 04:12:34 GMT
- Title: BiVRec: Bidirectional View-based Multimodal Sequential Recommendation
- Authors: Jiaxi Hu, Jingtong Gao, Xiangyu Zhao, Yuehong Hu, Yuxuan Liang, Yiqi
Wang, Ming He, Zitao Liu, Hongzhi Yin
- Abstract summary: We propose an innovative framework, BivRec, that jointly trains the recommendation tasks in both ID and multimodal views.
BivRec achieves state-of-the-art performance on five datasets and showcases various practical advantages.
- Score: 55.87443627659778
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of multimodal information into sequential recommender systems
has attracted significant attention in recent research. In the initial stages
of multimodal sequential recommendation models, the mainstream paradigm was
ID-dominant recommendations, wherein multimodal information was fused as side
information. However, due to their limitations in terms of transferability and
information intrusion, another paradigm emerged, wherein multimodal features
were employed directly for recommendation, enabling recommendation across
datasets. Nonetheless, it overlooked user ID information, resulting in low
information utilization and high training costs. To this end, we propose an
innovative framework, BivRec, that jointly trains the recommendation tasks in
both ID and multimodal views, leveraging their synergistic relationship to
enhance recommendation performance bidirectionally. To tackle the information
heterogeneity issue, we first construct structured user interest
representations and then learn the synergistic relationship between them.
Specifically, BivRec comprises three modules: Multi-scale Interest Embedding,
comprehensively modeling user interests by expanding user interaction sequences
with multi-scale patching; Intra-View Interest Decomposition, constructing
highly structured interest representations using carefully designed Gaussian
attention and Cluster attention; and Cross-View Interest Learning, learning the
synergistic relationship between the two recommendation views through
coarse-grained overall semantic similarity and fine-grained interest allocation
similarity BiVRec achieves state-of-the-art performance on five datasets and
showcases various practical advantages.
Related papers
- Bundle Recommendation with Item-level Causation-enhanced Multi-view Learning [1.901404011684453]
We present BunCa, a novel bundle recommendation approach employing item-level causation-enhanced multi-view learning.
BunCa provides comprehensive representations of users and bundles through two views: the Coherent View and the Cohesive View.
Experiments with BunCa on three benchmark datasets demonstrate the effectiveness of this novel research.
arXiv Detail & Related papers (2024-08-13T07:05:27Z) - MISSRec: Pre-training and Transferring Multi-modal Interest-aware
Sequence Representation for Recommendation [61.45986275328629]
We propose MISSRec, a multi-modal pre-training and transfer learning framework for sequential recommendation.
On the user side, we design a Transformer-based encoder-decoder model, where the contextual encoder learns to capture the sequence-level multi-modal user interests.
On the candidate item side, we adopt a dynamic fusion module to produce user-adaptive item representation.
arXiv Detail & Related papers (2023-08-22T04:06:56Z) - Feature Decoupling-Recycling Network for Fast Interactive Segmentation [79.22497777645806]
Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input.
We propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies.
arXiv Detail & Related papers (2023-08-07T12:26:34Z) - RGBT Tracking via Progressive Fusion Transformer with Dynamically Guided
Learning [37.067605349559]
We propose a novel Progressive Fusion Transformer called ProFormer.
It integrates single-modality information into the multimodal representation for robust RGBT tracking.
ProFormer sets a new state-of-the-art performance on RGBT210, RGBT234, LasHeR, and VTUAV datasets.
arXiv Detail & Related papers (2023-03-26T16:55:58Z) - Multi-View Clustering from the Perspective of Mutual Information [0.0]
We propose a novel model based on information theory termed Informative Multi-View Clustering (IMVC)
IMVC extracts the common and view-specific information hidden in multi-view data and constructs a clustering-oriented comprehensive representation.
We conduct extensive experiments on six benchmark datasets, and the experimental results indicate that the proposed IMVC outperforms other methods.
arXiv Detail & Related papers (2023-02-17T07:49:27Z) - Dual Representation Learning for One-Step Clustering of Multi-View Data [30.131568561100817]
We propose a novel one-step multi-view clustering method by exploiting the dual representation of both the common and specific information of different views.
With this framework, the representation learning and clustering partition mutually benefit each other, which effectively improve the clustering performance.
arXiv Detail & Related papers (2022-08-30T14:20:26Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - Multiplex Behavioral Relation Learning for Recommendation via Memory
Augmented Transformer Network [25.563806871858073]
This work proposes a Memory-Augmented Transformer Networks (MATN) to enable the recommendation with multiplex behavioral relational information.
In our MATN framework, we first develop a transformer-based multi-behavior relation encoder, to make the learned interaction representations be reflective of the cross-type behavior relations.
A memory attention network is proposed to supercharge MATN capturing the contextual signals of different types of behavior into the category-specific latent embedding space.
arXiv Detail & Related papers (2021-10-08T09:54:43Z) - Collaborative Attention Mechanism for Multi-View Action Recognition [75.33062629093054]
We propose a collaborative attention mechanism (CAM) for solving the multi-view action recognition problem.
The proposed CAM detects the attention differences among multi-view, and adaptively integrates frame-level information to benefit each other.
Experiments on four action datasets illustrate the proposed CAM achieves better results for each view and also boosts multi-view performance.
arXiv Detail & Related papers (2020-09-14T17:33:10Z) - Embedded Deep Bilinear Interactive Information and Selective Fusion for
Multi-view Learning [70.67092105994598]
We propose a novel multi-view learning framework to make the multi-view classification better aimed at the above-mentioned two aspects.
In particular, we train different deep neural networks to learn various intra-view representations.
Experiments on six publicly available datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2020-07-13T01:13:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.