Semantics-Aware Denoising: A PLM-Guided Sample Reweighting Strategy for Robust Recommendation
- URL: http://arxiv.org/abs/2602.15359v1
- Date: Tue, 17 Feb 2026 04:58:21 GMT
- Title: Semantics-Aware Denoising: A PLM-Guided Sample Reweighting Strategy for Robust Recommendation
- Authors: Xikai Yang, Yang Wang, Yilin Li, Sebastian Sun,
- Abstract summary: Implicit feedback, such as user clicks, serves as the primary data source for modern recommender systems.<n>We propose SAID (Semantics-Aware Implicit Denoising), a framework that leverages semantic consistency between user interests and item content to identify and downweight potentially noisy interactions.<n>Experiments on two real-world datasets demonstrate that SAID consistently improves recommendation performance, achieving up to 2.2% relative improvement in AUC over strong baselines.
- Score: 4.631922211808715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Implicit feedback, such as user clicks, serves as the primary data source for modern recommender systems. However, click interactions inherently contain substantial noise, including accidental clicks, clickbait-induced interactions, and exploratory browsing behaviors that do not reflect genuine user preferences. Training recommendation models with such noisy positive samples leads to degraded prediction accuracy and unreliable recommendations. In this paper, we propose SAID (Semantics-Aware Implicit Denoising), a simple yet effective framework that leverages semantic consistency between user interests and item content to identify and downweight potentially noisy interactions. Our approach constructs textual user interest profiles from historical behaviors and computes semantic similarity with target item descriptions using pre-trained language model (PLM) based text encoders. The similarity scores are then transformed into sample weights that modulate the training loss, effectively reducing the impact of semantically inconsistent clicks. Unlike existing denoising methods that require complex auxiliary networks or multi-stage training procedures, SAID only modifies the loss function while keeping the backbone recommendation model unchanged. Extensive experiments on two real-world datasets demonstrate that SAID consistently improves recommendation performance, achieving up to 2.2% relative improvement in AUC over strong baselines, with particularly notable robustness under high noise conditions.
Related papers
- Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards [8.109014000578766]
We present ASR-TRA, a novel Testtime Reinforcement Adaptation framework inspired by causal intervention.<n>Our method achieves higher accuracy while maintaining lower latency than existing TTA baselines.<n>Our approach provides a practical and robust solution for deploying ASR systems in challenging real-world conditions.
arXiv Detail & Related papers (2026-03-05T14:43:15Z) - Understand your Users, An Ensemble Learning Framework for Natural Noise Filtering in Recommender Systems [2.183830053778608]
This paper addresses the challenge of defining noise, which is inherently related to variability in human preferences and behaviors.<n>In classifying changes in user tendencies, we distinguish three kinds of phenomena: external factors that directly influence users' sentiment, serendipity causing unexpected preference, and incidental interaction perceived as noise.
arXiv Detail & Related papers (2025-09-23T02:36:27Z) - From Entity Reliability to Clean Feedback: An Entity-Aware Denoising Framework Beyond Interaction-Level Signals [20.323837731778358]
Implicit feedback is central to recommender systems but is inherently noisy, often impairing model training and degrading user experience.<n>We propose textbfEARD (textbfEntity-textbfAware textbfReliability-textbfDriven Denoising), a lightweight framework that shifts the focus from interaction-level signals to entity-level reliability.
arXiv Detail & Related papers (2025-08-14T17:20:12Z) - Shapley Value-driven Data Pruning for Recommender Systems [6.170723867840994]
Shapley Value-driven Valuation (SVV) is a framework that evaluates interactions based on their objective impact on model training.<n>SVV highlights the interactions with high values while downplaying low ones to achieve effective data pruning for recommender systems.<n> Experiments on four real-world datasets show that SVV outperforms existing denoising methods in both accuracy and robustness.
arXiv Detail & Related papers (2025-05-28T07:27:59Z) - Reinforced Interactive Continual Learning via Real-time Noisy Human Feedback [59.768119380109084]
This paper introduces an interactive continual learning paradigm where AI models dynamically learn new skills from real-time human feedback.<n>We propose RiCL, a Reinforced interactive Continual Learning framework leveraging Large Language Models (LLMs)<n>Our RiCL approach substantially outperforms existing combinations of state-of-the-art online continual learning and noisy-label learning methods.
arXiv Detail & Related papers (2025-05-15T03:22:03Z) - On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows [71.92083784393418]
Agentic AI (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low.<n>Inference-time alignment relies on three components: sampling, evaluation, and feedback.<n>We introduce Iterative Agent Decoding (IAD), a procedure that repeatedly inserts feedback extracted from different forms of critiques.
arXiv Detail & Related papers (2025-04-02T17:40:47Z) - $C^2$AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction [80.57232374640911]
We propose a model-agnostic strategy called the Mask-And-Recover (MAR)<n>MAR integrates both inter- and intra-modality contextual correlations to enable global inference within extraction modules.<n>To better target challenging parts within each sample, we introduce a Fine-grained Confidence Score (FCS) model.
arXiv Detail & Related papers (2025-04-01T13:01:30Z) - SRA-CL: Semantic Retrieval Augmented Contrastive Learning for Sequential Recommendation [23.050104678143935]
We propose a novel approach named Semantic Retrieval Augmented Contrastive Learning (SRA-CL)<n>SRA-CL leverages the semantic understanding and reasoning capabilities of LLMs to generate expressive embeddings that capture user preferences and item characteristics.<n>SRA-CL adopts a plug-and-play design, enabling seamless integration with existing sequential recommendation architectures.
arXiv Detail & Related papers (2025-03-06T07:25:19Z) - Large Language Model Enhanced Hard Sample Identification for Denoising Recommendation [4.297249011611168]
Implicit feedback is often used to build recommender systems.
Previous studies have attempted to alleviate this by identifying noisy samples based on their diverged patterns.
We propose a Large Language Model Enhanced Hard Sample Denoising framework.
arXiv Detail & Related papers (2024-09-16T14:57:09Z) - Disentangled Noisy Correspondence Learning [56.06801962154915]
Cross-modal retrieval is crucial in understanding latent correspondences across modalities.
DisNCL is a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning.
arXiv Detail & Related papers (2024-08-10T09:49:55Z) - ROPO: Robust Preference Optimization for Large Language Models [59.10763211091664]
We propose an iterative alignment approach that integrates noise-tolerance and filtering of noisy samples without the aid of external models.
Experiments on three widely-used datasets with Mistral-7B and Llama-2-7B demonstrate that ROPO significantly outperforms existing preference alignment methods.
arXiv Detail & Related papers (2024-04-05T13:58:51Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Noisy Pair Corrector for Dense Retrieval [59.312376423104055]
We propose a novel approach called Noisy Pair Corrector (NPC)
NPC consists of a detection module and a correction module.
We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS.
arXiv Detail & Related papers (2023-11-07T08:27:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.