Related papers: Shapley Value-driven Data Pruning for Recommender Systems

Shapley Value-driven Data Pruning for Recommender Systems

URL: http://arxiv.org/abs/2505.22057v1
Date: Wed, 28 May 2025 07:27:59 GMT
Title: Shapley Value-driven Data Pruning for Recommender Systems
Authors: Yansen Zhang, Xiaokun Zhang, Ziqiang Cui, Chen Ma,
Abstract summary: Shapley Value-driven Valuation (SVV) is a framework that evaluates interactions based on their objective impact on model training.<n>SVV highlights the interactions with high values while downplaying low ones to achieve effective data pruning for recommender systems.<n> Experiments on four real-world datasets show that SVV outperforms existing denoising methods in both accuracy and robustness.
Score: 6.170723867840994
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recommender systems often suffer from noisy interactions like accidental clicks or popularity bias. Existing denoising methods typically identify users' intent in their interactions, and filter out noisy interactions that deviate from the assumed intent. However, they ignore that interactions deemed noisy could still aid model training, while some ``clean'' interactions offer little learning value. To bridge this gap, we propose Shapley Value-driven Valuation (SVV), a framework that evaluates interactions based on their objective impact on model training rather than subjective intent assumptions. In SVV, a real-time Shapley value estimation method is devised to quantify each interaction's value based on its contribution to reducing training loss. Afterward, SVV highlights the interactions with high values while downplaying low ones to achieve effective data pruning for recommender systems. In addition, we develop a simulated noise protocol to examine the performance of various denoising approaches systematically. Experiments on four real-world datasets show that SVV outperforms existing denoising methods in both accuracy and robustness. Further analysis also demonstrates that our SVV can preserve training-critical interactions and offer interpretable noise assessment. This work shifts denoising from heuristic filtering to principled, model-driven interaction valuation.

Related papers

Semantics-Aware Denoising: A PLM-Guided Sample Reweighting Strategy for Robust Recommendation [4.631922211808715]
Implicit feedback, such as user clicks, serves as the primary data source for modern recommender systems.<n>We propose SAID (Semantics-Aware Implicit Denoising), a framework that leverages semantic consistency between user interests and item content to identify and downweight potentially noisy interactions.<n>Experiments on two real-world datasets demonstrate that SAID consistently improves recommendation performance, achieving up to 2.2% relative improvement in AUC over strong baselines.
arXiv Detail & Related papers (2026-02-17T04:58:21Z)
Rethinking Purity and Diversity in Multi-Behavior Sequential Recommendation from the Frequency Perspective [48.60281642851056]
In recommendation systems, users often exhibit multiple behaviors, such as browsing, clicking, and purchasing.<n>Some behavior data will also bring inevitable noise to the modeling of user interests.<n>These studies indicate that low-frequency information tends to be valuable and reliable, while high-frequency information is often associated with noise.
arXiv Detail & Related papers (2025-08-28T04:55:02Z)
From Entity Reliability to Clean Feedback: An Entity-Aware Denoising Framework Beyond Interaction-Level Signals [20.323837731778358]
Implicit feedback is central to recommender systems but is inherently noisy, often impairing model training and degrading user experience.<n>We propose textbfEARD (textbfEntity-textbfAware textbfReliability-textbfDriven Denoising), a lightweight framework that shifts the focus from interaction-level signals to entity-level reliability.
arXiv Detail & Related papers (2025-08-14T17:20:12Z)
Machine Unlearning for Robust DNNs: Attribution-Guided Partitioning and Neuron Pruning in Noisy Environments [5.8166742412657895]
Deep neural networks (DNNs) have achieved remarkable success across diverse domains, but their performance can be severely degraded by noisy or corrupted training data.<n>We propose a novel framework that integrates attribution-guided data partitioning, discriminative neuron pruning, and targeted fine-tuning to mitigate the impact of noisy samples.<n>Our framework achieves approximately a 10% absolute accuracy improvement over standard retraining on CIFAR-10 with injected label noise.
arXiv Detail & Related papers (2025-06-13T09:37:11Z)
Enhancing Federated Survival Analysis through Peer-Driven Client Reputation in Healthcare [1.2289361708127877]
Federated Learning holds great promise for digital health by enabling collaborative model training without compromising patient data privacy.<n>We propose a peer-driven reputation mechanism for federated healthcare that integrates decentralized peer feedback with clustering-based noise handling.<n>Applying differential privacy to client-side model updates ensures sensitive information remains protected during reputation computation.
arXiv Detail & Related papers (2025-05-22T03:49:51Z)
DynaNoise: Dynamic Probabilistic Noise Injection for Defending Against Membership Inference Attacks [6.610581923321801]
Membership Inference Attacks (MIAs) pose a significant risk to the privacy of training datasets.<n>Traditional mitigation techniques rely on injecting a fixed amount of noise during training or inference.<n>We present DynaNoise, an adaptive approach that dynamically modulates noise injection based on query sensitivity.
arXiv Detail & Related papers (2025-05-19T17:07:00Z)
Reinforced Interactive Continual Learning via Real-time Noisy Human Feedback [59.768119380109084]
This paper introduces an interactive continual learning paradigm where AI models dynamically learn new skills from real-time human feedback.<n>We propose RiCL, a Reinforced interactive Continual Learning framework leveraging Large Language Models (LLMs)<n>Our RiCL approach substantially outperforms existing combinations of state-of-the-art online continual learning and noisy-label learning methods.
arXiv Detail & Related papers (2025-05-15T03:22:03Z)
Large Language Model Enhanced Hard Sample Identification for Denoising Recommendation [4.297249011611168]
Implicit feedback is often used to build recommender systems. Previous studies have attempted to alleviate this by identifying noisy samples based on their diverged patterns. We propose a Large Language Model Enhanced Hard Sample Denoising framework.
arXiv Detail & Related papers (2024-09-16T14:57:09Z)
Impact of Noisy Supervision in Foundation Model Learning [91.56591923244943]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.<n>We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Noisy Pair Corrector for Dense Retrieval [59.312376423104055]
We propose a novel approach called Noisy Pair Corrector (NPC) NPC consists of a detection module and a correction module. We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS.
arXiv Detail & Related papers (2023-11-07T08:27:14Z)
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks. We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z)
WSLRec: Weakly Supervised Learning for Neural Sequential Recommendation Models [24.455665093145818]
We propose a novel model-agnostic training approach called WSLRec, which adopts a three-stage framework: pre-training, top-$k$ mining, intrinsic and fine-tuning. WSLRec resolves the incompleteness problem by pre-training models on extra weak supervisions from model-free methods like BR and ItemCF, while resolving the inaccuracy problem by leveraging the top-$k$ mining to screen out reliable user-item relevance from weak supervisions for fine-tuning.
arXiv Detail & Related papers (2022-02-28T08:55:12Z)
Probabilistic and Variational Recommendation Denoising [56.879165033014026]
Learning from implicit feedback is one of the most common cases in the application of recommender systems. We propose probabilistic and variational recommendation denoising for implicit feedback. We employ the proposed DPI and DVAE on four state-of-the-art recommendation models and conduct experiments on three datasets.
arXiv Detail & Related papers (2021-05-20T08:59:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.