Teach Me How to Denoise: A Universal Framework for Denoising Multi-modal Recommender Systems via Guided Calibration
- URL: http://arxiv.org/abs/2504.14214v1
- Date: Sat, 19 Apr 2025 07:37:03 GMT
- Title: Teach Me How to Denoise: A Universal Framework for Denoising Multi-modal Recommender Systems via Guided Calibration
- Authors: Hongji Li, Hanwen Du, Youhua Li, Junchen Fu, Chunxiao Li, Ziyi Zhuang, Jiakang Li, Yongxin Ni,
- Abstract summary: We propose a universal guided in-sync distillation denoising framework for multi-modal recommendation (GUIDER)<n>Specifically, GUIDER uses a re-calibration strategy to identify clean and noisy interactions from modal content.<n>It incorporates a Denoising Bayesian Personalized Ranking (DBPR) loss function to handle implicit user feedback.
- Score: 3.6854332833964745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The surge in multimedia content has led to the development of Multi-Modal Recommender Systems (MMRecs), which use diverse modalities such as text, images, videos, and audio for more personalized recommendations. However, MMRecs struggle with noisy data caused by misalignment among modal content and the gap between modal semantics and recommendation semantics. Traditional denoising methods are inadequate due to the complexity of multi-modal data. To address this, we propose a universal guided in-sync distillation denoising framework for multi-modal recommendation (GUIDER), designed to improve MMRecs by denoising user feedback. Specifically, GUIDER uses a re-calibration strategy to identify clean and noisy interactions from modal content. It incorporates a Denoising Bayesian Personalized Ranking (DBPR) loss function to handle implicit user feedback. Finally, it applies a denoising knowledge distillation objective based on Optimal Transport distance to guide the alignment from modality representations to recommendation semantics. GUIDER can be seamlessly integrated into existing MMRecs methods as a plug-and-play solution. Experimental results on four public datasets demonstrate its effectiveness and generalizability. Our source code is available at https://github.com/Neon-Jing/Guider
Related papers
- When SparseMoE Meets Noisy Interactions: An Ensemble View on Denoising Recommendation [3.050721435894337]
We propose a novel Adaptive Ensemble Learning (AEL) for denoising recommendation.<n>AEL employs a sparse gating network as a brain, selecting suitable experts to synthesize appropriate denoising capacities.<n>To address the ensemble learning shortcoming of model complexity, we also proposed a novel method that stacks components to create sub-recommenders.
arXiv Detail & Related papers (2024-09-19T12:55:34Z) - Improving Multi-modal Recommender Systems by Denoising and Aligning Multi-modal Content and User Feedback [32.10029754890383]
We propose Denoising and Aligning Multi-modal Recommender System (DA-MRS)<n>To mitigate multi-modal noise, DA-MRS first constructs item-item graphs determined by consistent content similarity across modalities.<n>To denoise user feedback, DA-MRS associates the probability of observed feedback with multi-modal content and devises a denoised BPR loss.
arXiv Detail & Related papers (2024-06-18T11:05:32Z) - TruthSR: Trustworthy Sequential Recommender Systems via User-generated Multimodal Content [21.90660366765994]
We propose a trustworthy sequential recommendation method via noisy user-generated multi-modal content.
Specifically, we capture the consistency and complementarity of user-generated multi-modal content to mitigate noise interference.
In addition, we design a trustworthy decision mechanism that integrates subjective user perspective and objective item perspective.
arXiv Detail & Related papers (2024-04-26T08:23:36Z) - A Study of Dropout-Induced Modality Bias on Robustness to Missing Video
Frames for Audio-Visual Speech Recognition [53.800937914403654]
Advanced Audio-Visual Speech Recognition (AVSR) systems have been observed to be sensitive to missing video frames.
While applying the dropout technique to the video modality enhances robustness to missing frames, it simultaneously results in a performance loss when dealing with complete data input.
We propose a novel Multimodal Distribution Approximation with Knowledge Distillation (MDA-KD) framework to reduce over-reliance on the audio modality.
arXiv Detail & Related papers (2024-03-07T06:06:55Z) - DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions [52.63323657077447]
We propose DNMOT, an end-to-end trainable DeNoising Transformer for multiple object tracking.
Specifically, we augment the trajectory with noises during training and make our model learn the denoising process in an encoder-decoder architecture.
We conduct extensive experiments on the MOT17, MOT20, and DanceTrack datasets, and the experimental results show that our method outperforms previous state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2023-09-09T04:40:01Z) - Mining Stable Preferences: Adaptive Modality Decorrelation for
Multimedia Recommendation [23.667430143035787]
We propose a novel MOdality DEcorrelating STable learning framework, MODEST for brevity, to learn users' stable preference.
Inspired by sample re-weighting techniques, the proposed method aims to estimate a weight for each item, such that the features from different modalities in the weighted distribution are decorrelated.
Our method could be served as a play-and-plug module for existing multimedia recommendation backbones.
arXiv Detail & Related papers (2023-06-25T09:09:11Z) - Learning Task-Oriented Flows to Mutually Guide Feature Alignment in
Synthesized and Real Video Denoising [137.5080784570804]
Video denoising aims at removing noise from videos to recover clean ones.
Some existing works show that optical flow can help the denoising by exploiting the additional spatial-temporal clues from nearby frames.
We propose a new multi-scale refined optical flow-guided video denoising method, which is more robust to different noise levels.
arXiv Detail & Related papers (2022-08-25T00:09:18Z) - MANet: Improving Video Denoising with a Multi-Alignment Network [72.93429911044903]
We present a multi-alignment network, which generates multiple flow proposals followed by attention-based averaging.
Experiments on a large-scale video dataset demonstrate that our method improves the denoising baseline model by 0.2dB.
arXiv Detail & Related papers (2022-02-20T00:52:07Z) - Probabilistic and Variational Recommendation Denoising [56.879165033014026]
Learning from implicit feedback is one of the most common cases in the application of recommender systems.
We propose probabilistic and variational recommendation denoising for implicit feedback.
We employ the proposed DPI and DVAE on four state-of-the-art recommendation models and conduct experiments on three datasets.
arXiv Detail & Related papers (2021-05-20T08:59:44Z) - Fully Unsupervised Diversity Denoising with Convolutional Variational
Autoencoders [81.30960319178725]
We propose DivNoising, a denoising approach based on fully convolutional variational autoencoders (VAEs)
First we introduce a principled way of formulating the unsupervised denoising problem within the VAE framework by explicitly incorporating imaging noise models into the decoder.
We show that such a noise model can either be measured, bootstrapped from noisy data, or co-learned during training.
arXiv Detail & Related papers (2020-06-10T21:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.