Shapley Value-driven Data Pruning for Recommender Systems
- URL: http://arxiv.org/abs/2505.22057v1
- Date: Wed, 28 May 2025 07:27:59 GMT
- Title: Shapley Value-driven Data Pruning for Recommender Systems
- Authors: Yansen Zhang, Xiaokun Zhang, Ziqiang Cui, Chen Ma,
- Abstract summary: Shapley Value-driven Valuation (SVV) is a framework that evaluates interactions based on their objective impact on model training.<n>SVV highlights the interactions with high values while downplaying low ones to achieve effective data pruning for recommender systems.<n> Experiments on four real-world datasets show that SVV outperforms existing denoising methods in both accuracy and robustness.
- Score: 6.170723867840994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recommender systems often suffer from noisy interactions like accidental clicks or popularity bias. Existing denoising methods typically identify users' intent in their interactions, and filter out noisy interactions that deviate from the assumed intent. However, they ignore that interactions deemed noisy could still aid model training, while some ``clean'' interactions offer little learning value. To bridge this gap, we propose Shapley Value-driven Valuation (SVV), a framework that evaluates interactions based on their objective impact on model training rather than subjective intent assumptions. In SVV, a real-time Shapley value estimation method is devised to quantify each interaction's value based on its contribution to reducing training loss. Afterward, SVV highlights the interactions with high values while downplaying low ones to achieve effective data pruning for recommender systems. In addition, we develop a simulated noise protocol to examine the performance of various denoising approaches systematically. Experiments on four real-world datasets show that SVV outperforms existing denoising methods in both accuracy and robustness. Further analysis also demonstrates that our SVV can preserve training-critical interactions and offer interpretable noise assessment. This work shifts denoising from heuristic filtering to principled, model-driven interaction valuation.
Related papers
- Machine Unlearning for Robust DNNs: Attribution-Guided Partitioning and Neuron Pruning in Noisy Environments [5.8166742412657895]
Deep neural networks (DNNs) have achieved remarkable success across diverse domains, but their performance can be severely degraded by noisy or corrupted training data.<n>We propose a novel framework that integrates attribution-guided data partitioning, discriminative neuron pruning, and targeted fine-tuning to mitigate the impact of noisy samples.<n>Our framework achieves approximately a 10% absolute accuracy improvement over standard retraining on CIFAR-10 with injected label noise.
arXiv Detail & Related papers (2025-06-13T09:37:11Z) - Enhancing Federated Survival Analysis through Peer-Driven Client Reputation in Healthcare [1.2289361708127877]
Federated Learning holds great promise for digital health by enabling collaborative model training without compromising patient data privacy.<n>We propose a peer-driven reputation mechanism for federated healthcare that integrates decentralized peer feedback with clustering-based noise handling.<n>Applying differential privacy to client-side model updates ensures sensitive information remains protected during reputation computation.
arXiv Detail & Related papers (2025-05-22T03:49:51Z) - DynaNoise: Dynamic Probabilistic Noise Injection for Defending Against Membership Inference Attacks [6.610581923321801]
Membership Inference Attacks (MIAs) pose a significant risk to the privacy of training datasets.<n>Traditional mitigation techniques rely on injecting a fixed amount of noise during training or inference.<n>We present DynaNoise, an adaptive approach that dynamically modulates noise injection based on query sensitivity.
arXiv Detail & Related papers (2025-05-19T17:07:00Z) - Reinforced Interactive Continual Learning via Real-time Noisy Human Feedback [59.768119380109084]
This paper introduces an interactive continual learning paradigm where AI models dynamically learn new skills from real-time human feedback.<n>We propose RiCL, a Reinforced interactive Continual Learning framework leveraging Large Language Models (LLMs)<n>Our RiCL approach substantially outperforms existing combinations of state-of-the-art online continual learning and noisy-label learning methods.
arXiv Detail & Related papers (2025-05-15T03:22:03Z) - Large Language Model Enhanced Hard Sample Identification for Denoising Recommendation [4.297249011611168]
Implicit feedback is often used to build recommender systems.
Previous studies have attempted to alleviate this by identifying noisy samples based on their diverged patterns.
We propose a Large Language Model Enhanced Hard Sample Denoising framework.
arXiv Detail & Related papers (2024-09-16T14:57:09Z) - Impact of Noisy Supervision in Foundation Model Learning [91.56591923244943]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.<n>We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Noisy Pair Corrector for Dense Retrieval [59.312376423104055]
We propose a novel approach called Noisy Pair Corrector (NPC)
NPC consists of a detection module and a correction module.
We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS.
arXiv Detail & Related papers (2023-11-07T08:27:14Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - WSLRec: Weakly Supervised Learning for Neural Sequential Recommendation
Models [24.455665093145818]
We propose a novel model-agnostic training approach called WSLRec, which adopts a three-stage framework: pre-training, top-$k$ mining, intrinsic and fine-tuning.
WSLRec resolves the incompleteness problem by pre-training models on extra weak supervisions from model-free methods like BR and ItemCF, while resolving the inaccuracy problem by leveraging the top-$k$ mining to screen out reliable user-item relevance from weak supervisions for fine-tuning.
arXiv Detail & Related papers (2022-02-28T08:55:12Z) - Probabilistic and Variational Recommendation Denoising [56.879165033014026]
Learning from implicit feedback is one of the most common cases in the application of recommender systems.
We propose probabilistic and variational recommendation denoising for implicit feedback.
We employ the proposed DPI and DVAE on four state-of-the-art recommendation models and conduct experiments on three datasets.
arXiv Detail & Related papers (2021-05-20T08:59:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.