Multi-Modal Multi-Behavior Sequential Recommendation with Conditional Diffusion-Based Feature Denoising
- URL: http://arxiv.org/abs/2508.05352v1
- Date: Thu, 07 Aug 2025 12:58:34 GMT
- Title: Multi-Modal Multi-Behavior Sequential Recommendation with Conditional Diffusion-Based Feature Denoising
- Authors: Xiaoxi Cui, Weihai Lu, Yu Tong, Yiheng Li, Zhejun Zhao,
- Abstract summary: This paper focuses on the problem of multi-modal multi-behavior sequential recommendation.<n>We propose a novel Multi-Modal Multi-Behavior Sequential Recommendation model (M$3$BSR)<n> Experimental results indicate that M$3$BSR significantly outperforms existing state-of-the-art methods on benchmark datasets.
- Score: 1.4207530018625354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The sequential recommendation system utilizes historical user interactions to predict preferences. Effectively integrating diverse user behavior patterns with rich multimodal information of items to enhance the accuracy of sequential recommendations is an emerging and challenging research direction. This paper focuses on the problem of multi-modal multi-behavior sequential recommendation, aiming to address the following challenges: (1) the lack of effective characterization of modal preferences across different behaviors, as user attention to different item modalities varies depending on the behavior; (2) the difficulty of effectively mitigating implicit noise in user behavior, such as unintended actions like accidental clicks; (3) the inability to handle modality noise in multi-modal representations, which further impacts the accurate modeling of user preferences. To tackle these issues, we propose a novel Multi-Modal Multi-Behavior Sequential Recommendation model (M$^3$BSR). This model first removes noise in multi-modal representations using a Conditional Diffusion Modality Denoising Layer. Subsequently, it utilizes deep behavioral information to guide the denoising of shallow behavioral data, thereby alleviating the impact of noise in implicit feedback through Conditional Diffusion Behavior Denoising. Finally, by introducing a Multi-Expert Interest Extraction Layer, M$^3$BSR explicitly models the common and specific interests across behaviors and modalities to enhance recommendation performance. Experimental results indicate that M$^3$BSR significantly outperforms existing state-of-the-art methods on benchmark datasets.
Related papers
- Multimodal Difference Learning for Sequential Recommendation [5.243083216855681]
We argue that user interests and item relationships vary across different modalities.<n>We propose a novel Multimodal Learning framework for Sequential Recommendation, MDSRec.<n>Results on five real-world datasets demonstrate the superiority of MDSRec over state-of-the-art baselines.
arXiv Detail & Related papers (2024-12-11T05:08:19Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - Behavior-Contextualized Item Preference Modeling for Multi-Behavior Recommendation [30.715182718492244]
This paper introduces a novel approach, Behavior-Contextualized Item Preference Modeling (BCIPM) for multi-behavior recommendation.
Our proposed Behavior-Contextualized Item Preference Network discerns and learns users' specific item preferences within each behavior.
It then considers only those preferences relevant to the target behavior for final recommendations, significantly reducing noise from auxiliary behaviors.
arXiv Detail & Related papers (2024-04-28T12:46:36Z) - TruthSR: Trustworthy Sequential Recommender Systems via User-generated Multimodal Content [21.90660366765994]
We propose a trustworthy sequential recommendation method via noisy user-generated multi-modal content.
Specifically, we capture the consistency and complementarity of user-generated multi-modal content to mitigate noise interference.
In addition, we design a trustworthy decision mechanism that integrates subjective user perspective and objective item perspective.
arXiv Detail & Related papers (2024-04-26T08:23:36Z) - LD4MRec: Simplifying and Powering Diffusion Model for Multimedia Recommendation [6.914898966090197]
We propose a Light Diffusion model for Multimedia Recommendation (LD4MRec)<n> LD4MRec employs a forward-free inference strategy, which directly predicts future behaviors from observed noisy behaviors.<n>Experiments conducted on three real-world datasets demonstrate the effectiveness of LD4MRec.
arXiv Detail & Related papers (2023-09-27T02:12:41Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Diffusion Recommender Model [85.9640416600725]
We propose a novel Diffusion Recommender Model (named DiffRec) to learn the generative process in a denoising manner.<n>To retain personalized information in user interactions, DiffRec reduces the added noises and avoids corrupting users' interactions into pure noises like in image synthesis.
arXiv Detail & Related papers (2023-04-11T04:31:00Z) - Diffusion Action Segmentation [63.061058214427085]
We propose a novel framework via denoising diffusion models, which shares the same inherent spirit of such iterative refinement.
In this framework, action predictions are iteratively generated from random noise with input video features as conditions.
arXiv Detail & Related papers (2023-03-31T10:53:24Z) - Coarse-to-Fine Knowledge-Enhanced Multi-Interest Learning Framework for
Multi-Behavior Recommendation [52.89816309759537]
Multi-types of behaviors (e.g., clicking, adding to cart, purchasing, etc.) widely exist in most real-world recommendation scenarios.
The state-of-the-art multi-behavior models learn behavior dependencies indistinguishably with all historical interactions as input.
We propose a novel Coarse-to-fine Knowledge-enhanced Multi-interest Learning framework to learn shared and behavior-specific interests for different behaviors.
arXiv Detail & Related papers (2022-08-03T05:28:14Z) - Sequential Recommendation with Self-Attentive Multi-Adversarial Network [101.25533520688654]
We present a Multi-Factor Generative Adversarial Network (MFGAN) for explicitly modeling the effect of context information on sequential recommendation.
Our framework is flexible to incorporate multiple kinds of factor information, and is able to trace how each factor contributes to the recommendation decision over time.
arXiv Detail & Related papers (2020-05-21T12:28:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.