Related papers: Decoupled Side Information Fusion for Sequential Recommendation

Decoupled Side Information Fusion for Sequential Recommendation

URL: http://arxiv.org/abs/2204.11046v1
Date: Sat, 23 Apr 2022 10:53:36 GMT
Title: Decoupled Side Information Fusion for Sequential Recommendation
Authors: Yueqi Xie, Peilin Zhou, Sunghun Kim
Abstract summary: We propose Decoupled Side Information Fusion for Sequential Recommendation (DIF-SR) It moves the side information from the input to the attention layer and decouples the attention calculation of various side information and item representation. Our proposed solution stably outperforms state-of-the-art SR models.
Score: 6.515279047538104
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Side information fusion for sequential recommendation (SR) aims to effectively leverage various side information to enhance the performance of next-item prediction. Most state-of-the-art methods build on self-attention networks and focus on exploring various solutions to integrate the item embedding and side information embeddings before the attention layer. However, our analysis shows that the early integration of various types of embeddings limits the expressiveness of attention matrices due to a rank bottleneck and constrains the flexibility of gradients. Also, it involves mixed correlations among the different heterogeneous information resources, which brings extra disturbance to attention calculation. Motivated by this, we propose Decoupled Side Information Fusion for Sequential Recommendation (DIF-SR), which moves the side information from the input to the attention layer and decouples the attention calculation of various side information and item representation. We theoretically and empirically show that the proposed solution allows higher-rank attention matrices and flexible gradients to enhance the modeling capacity of side information fusion. Also, auxiliary attribute predictors are proposed to further activate the beneficial interaction between side information and item representation learning. Extensive experiments on four real-world datasets demonstrate that our proposed solution stably outperforms state-of-the-art SR models. Further studies show that our proposed solution can be readily incorporated into current attention-based SR models and significantly boost performance. Our source code is available at https://github.com/AIM-SE/DIF-SR.

Related papers

ICG-MVSNet: Learning Intra-view and Cross-view Relationships for Guidance in Multi-View Stereo [14.506324605370436]
Multi-view Stereo (MVS) aims to estimate depth and reconstruct 3D point clouds from a series of overlapping images. Recent learning-based MVS frameworks overlook the geometric information embedded in features and correlations, leading to weak cost matching. We propose ICG-MVSNet, which explicitly integrates intra-view and cross-view relationships for depth estimation.
arXiv Detail & Related papers (2025-03-27T14:13:31Z)
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence [83.15764564701706]
We propose a novel framework that performs distributional vision-language alignment by integrating Cauchy-Schwarz divergence with mutual information. In the proposed framework, we find that the CS divergence and mutual information serve complementary roles in multimodal alignment, capturing both the global distribution information of each modality and the pairwise semantic relationships. Experiments on text-to-image generation and cross-modality retrieval tasks demonstrate the effectiveness of our method on vision-language alignment.
arXiv Detail & Related papers (2025-02-24T10:29:15Z)
KGIF: Optimizing Relation-Aware Recommendations with Knowledge Graph Information Fusion [16.971592142597544]
This study introduces a specialized framework designed to merge entity and relation embeddings explicitly through a tailored self-attention mechanism. This explicit fusion enhances the interplay between user-item interactions and item-attribute relationships, providing a nuanced balance between user-centric and item-centric representations. The contributions of this work include an innovative method for explicit information fusion, improved robustness for sparse knowledge graphs, and the ability to generate explainable recommendations through interpretable path visualization.
arXiv Detail & Related papers (2025-01-07T22:19:15Z)
Rethinking Normalization Strategies and Convolutional Kernels for Multimodal Image Fusion [25.140475569677758]
Multimodal image fusion aims to integrate information from different modalities to obtain a comprehensive image. Existing methods tend to prioritize natural image fusion and focus on information complementary and network training strategies. This paper dissects the significant differences between the two tasks regarding fusion goals, statistical properties, and data distribution.
arXiv Detail & Related papers (2024-11-15T08:36:24Z)
Learning Accurate and Enriched Features for Stereo Image Super-Resolution [0.0]
Stereo image super-resolution (stereoSR) aims to enhance the quality of super-resolution results by incorporating complementary information from an alternative view. We propose a mixed-scale selective fusion network (MSSFNet) to preserve precise spatial details and incorporate abundant contextual information. MSSFNet achieves significant improvements over state-of-the-art approaches on both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2024-06-23T03:34:17Z)
GASE: Graph Attention Sampling with Edges Fusion for Solving Vehicle Routing Problems [6.084414764415137]
We propose an adaptive Graph Attention Sampling with the Edges Fusion framework to solve vehicle routing problems. Our proposed model outperforms the existing methods by 2.08%-6.23% and shows stronger generalization ability.
arXiv Detail & Related papers (2024-05-21T03:33:07Z)
BiVRec: Bidirectional View-based Multimodal Sequential Recommendation [55.87443627659778]
We propose an innovative framework, BivRec, that jointly trains the recommendation tasks in both ID and multimodal views. BivRec achieves state-of-the-art performance on five datasets and showcases various practical advantages.
arXiv Detail & Related papers (2024-02-27T09:10:41Z)
Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling [96.75821232222201]
Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation. We propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting.
arXiv Detail & Related papers (2023-05-19T14:56:57Z)
Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention [15.643176705932396]
We introduce a joint cross-attentional model for A-V fusion that extracts the salient features across A-V modalities. It computes the cross-attention weights based on correlation between the joint feature representation and that of the individual modalities. Results indicate that our joint cross-attentional A-V fusion model provides a cost-effective solution that can outperform state-of-the-art approaches.
arXiv Detail & Related papers (2022-09-19T15:01:55Z)
INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL) We integrate a term inspired by variational empowerment into a state-space model based on mutual information. We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z)
Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition [13.994609732846344]
Most effective techniques for emotion recognition efficiently leverage diverse and complimentary sources of information. We introduce a cross-attentional fusion approach to extract the salient features across audio-visual (A-V) modalities. Results indicate that our cross-attentional A-V fusion model is a cost-effective approach that outperforms state-of-the-art fusion approaches.
arXiv Detail & Related papers (2021-11-09T16:01:56Z)
Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection [145.4919781325014]
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection. Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods. We propose a novel mutual attention model by fusing attention and contexts from different modalities.
arXiv Detail & Related papers (2020-10-12T08:50:10Z)
Collaborative Attention Mechanism for Multi-View Action Recognition [75.33062629093054]
We propose a collaborative attention mechanism (CAM) for solving the multi-view action recognition problem. The proposed CAM detects the attention differences among multi-view, and adaptively integrates frame-level information to benefit each other. Experiments on four action datasets illustrate the proposed CAM achieves better results for each view and also boosts multi-view performance.
arXiv Detail & Related papers (2020-09-14T17:33:10Z)
Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-based Person Re-identification [98.7585431239291]
Video-based person re-identification aims at matching the same person across video clips. In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-Attentive Feature aggregation module MG-RAFA. Our framework achieves the state-of-the-art ablation performance on three benchmark datasets.
arXiv Detail & Related papers (2020-03-27T03:49:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.