Addressing Missing Data Issue for Diffusion-based Recommendation
- URL: http://arxiv.org/abs/2505.12283v1
- Date: Sun, 18 May 2025 07:45:46 GMT
- Title: Addressing Missing Data Issue for Diffusion-based Recommendation
- Authors: Wenyu Mao, Zhengyi Yang, Jiancan Wu, Haozhe Liu, Yancheng Yuan, Xiang Wang, Xiangnan He,
- Abstract summary: We propose a novel dual-side Thompson sampling-based Diffusion Model (TDM)<n>TDM simulates extra missing data in the guidance signals and allows diffusion models to handle existing missing data through extrapolation.<n> experiments and theoretical analysis validate the effectiveness of TDM in addressing missing data in sequential recommendations.
- Score: 26.605773432154518
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have shown significant potential in generating oracle items that best match user preference with guidance from user historical interaction sequences. However, the quality of guidance is often compromised by unpredictable missing data in observed sequence, leading to suboptimal item generation. Since missing data is uncertain in both occurrence and content, recovering it is impractical and may introduce additional errors. To tackle this challenge, we propose a novel dual-side Thompson sampling-based Diffusion Model (TDM), which simulates extra missing data in the guidance signals and allows diffusion models to handle existing missing data through extrapolation. To preserve user preference evolution in sequences despite extra missing data, we introduce Dual-side Thompson Sampling to implement simulation with two probability models, sampling by exploiting user preference from both item continuity and sequence stability. TDM strategically removes items from sequences based on dual-side Thompson sampling and treats these edited sequences as guidance for diffusion models, enhancing models' robustness to missing data through consistency regularization. Additionally, to enhance the generation efficiency, TDM is implemented under the denoising diffusion implicit models to accelerate the reverse process. Extensive experiments and theoretical analysis validate the effectiveness of TDM in addressing missing data in sequential recommendations.
Related papers
- ItDPDM: Information-Theoretic Discrete Poisson Diffusion Model [5.24776944932192]
We introduce the Information-Theoretic Discrete Poisson Diffusion Model (ItDPDM), inspired by photon arrival process.<n>Central to our approach is an information-theoretic Poisson Reconstruction Loss (PRL) that has a provable exact relationship with the true data likelihood.<n>ItDPDM attains superior likelihood estimates and competitive generation quality-demonstrating a proof of concept for distribution-robust discrete generative modeling.
arXiv Detail & Related papers (2025-05-08T09:29:05Z) - Joint Models for Handling Non-Ignorable Missing Data using Bayesian Additive Regression Trees: Application to Leaf Photosynthetic Traits Data [0.0]
Dealing with missing data poses significant challenges in predictive analysis.<n>In cases where the data are missing not at random, jointly modeling the data and missing data indicators is essential.<n>We propose two methods under a selection model framework for handling data with missingness.
arXiv Detail & Related papers (2024-12-19T15:26:55Z) - Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model [66.91323540178739]
Sequential recommendation (SR) aims to predict items that users may be interested in based on their historical behavior.
We revisit SR from a novel information-theoretic perspective and find that sequential modeling methods fail to adequately capture randomness and unpredictability of user behavior.
Inspired by fuzzy information processing theory, this paper introduces the fuzzy sets of interaction sequences to overcome the limitations and better capture the evolution of users' real interests.
arXiv Detail & Related papers (2024-10-31T14:52:01Z) - Dual Conditional Diffusion Models for Sequential Recommendation [63.82152785755723]
We propose Dual Conditional Diffusion Models for Sequential Recommendation (DCRec)<n>DCRec integrates implicit and explicit information by embedding dual conditions into both the forward and reverse diffusion processes.<n>This allows the model to retain valuable sequential and contextual information while leveraging explicit user-item interactions to guide the recommendation process.
arXiv Detail & Related papers (2024-10-29T11:51:06Z) - Pruning then Reweighting: Towards Data-Efficient Training of Diffusion Models [33.09663675904689]
We investigate efficient diffusion training from the perspective of dataset pruning.
Inspired by the principles of data-efficient training for generative models such as generative adversarial networks (GANs), we first extend the data selection scheme used in GANs to DM training.
To further improve the generation performance, we employ a class-wise reweighting approach.
arXiv Detail & Related papers (2024-09-27T20:21:19Z) - Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset.
We develop constrained diffusion models by imposing diffusion constraints based on desired distributions.
We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z) - Self-Supervision Improves Diffusion Models for Tabular Data Imputation [20.871219616589986]
This paper introduces an advanced diffusion model named Self-supervised imputation Diffusion Model (SimpDM for brevity)
To mitigate sensitivity to noise, we introduce a self-supervised alignment mechanism that aims to regularize the model, ensuring consistent and stable imputation predictions.
We also introduce a carefully devised state-dependent data augmentation strategy within SimpDM, enhancing the robustness of the diffusion model when dealing with limited data.
arXiv Detail & Related papers (2024-07-25T13:06:30Z) - DiffPuter: Empowering Diffusion Models for Missing Data Imputation [56.48119008663155]
This paper introduces DiffPuter, a tailored diffusion model combined with the Expectation-Maximization (EM) algorithm for missing data imputation.<n>Our theoretical analysis shows that DiffPuter's training step corresponds to the maximum likelihood estimation of data density.<n>Our experiments show that DiffPuter achieves an average improvement of 6.94% in MAE and 4.78% in RMSE compared to the most competitive existing method.
arXiv Detail & Related papers (2024-05-31T08:35:56Z) - DiffImpute: Tabular Data Imputation With Denoising Diffusion Probabilistic Model [9.908561639396273]
We propose DiffImpute, a novel Denoising Diffusion Probabilistic Model (DDPM)
It produces credible imputations for missing entries without undermining the authenticity of the existing data.
It can be applied to various settings of Missing Completely At Random (MCAR) and Missing At Random (MAR)
arXiv Detail & Related papers (2024-03-20T08:45:31Z) - Diffusion Augmentation for Sequential Recommendation [47.43402785097255]
We propose a Diffusion Augmentation for Sequential Recommendation (DiffuASR) for a higher quality generation.
The augmented dataset by DiffuASR can be used to train the sequential recommendation models directly, free from complex training procedures.
We conduct extensive experiments on three real-world datasets with three sequential recommendation models.
arXiv Detail & Related papers (2023-09-22T13:31:34Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.