Improved Personalized Headline Generation via Denoising Fake Interests from Implicit Feedback
- URL: http://arxiv.org/abs/2508.07178v2
- Date: Thu, 14 Aug 2025 06:43:57 GMT
- Title: Improved Personalized Headline Generation via Denoising Fake Interests from Implicit Feedback
- Authors: Kejin Liu, Junhong Lian, Xiang Ao, Ningtao Wang, Xing Fu, Yu Cheng, Weiqiang Wang, Xinyu Liu,
- Abstract summary: We propose a novel Personalized Headline Generation framework via Denoising Fake Interests from Implicit Feedback (PHG-DIF)<n>PHG-DIF first employs dual-stage filtering to effectively remove clickstream noise, identified by short dwell times and abnormal click bursts.<n>We release DT-PENS, a new benchmark dataset comprising the click behavior of 1,000 carefully curated users and nearly 10,000 annotated personalized headlines.
- Score: 50.365038320507686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate personalized headline generation hinges on precisely capturing user interests from historical behaviors. However, existing methods neglect personalized-irrelevant click noise in entire historical clickstreams, which may lead to hallucinated headlines that deviate from genuine user preferences. In this paper, we reveal the detrimental impact of click noise on personalized generation quality through rigorous analysis in both user and news dimensions. Based on these insights, we propose a novel Personalized Headline Generation framework via Denoising Fake Interests from Implicit Feedback (PHG-DIF). PHG-DIF first employs dual-stage filtering to effectively remove clickstream noise, identified by short dwell times and abnormal click bursts, and then leverages multi-level temporal fusion to dynamically model users' evolving and multi-faceted interests for precise profiling. Moreover, we release DT-PENS, a new benchmark dataset comprising the click behavior of 1,000 carefully curated users and nearly 10,000 annotated personalized headlines with historical dwell time annotations. Extensive experiments demonstrate that PHG-DIF substantially mitigates the adverse effects of click noise and significantly improves headline quality, achieving state-of-the-art (SOTA) results on DT-PENS. Our framework implementation and dataset are available at https://github.com/liukejin-up/PHG-DIF.
Related papers
- Semantics-Aware Denoising: A PLM-Guided Sample Reweighting Strategy for Robust Recommendation [4.631922211808715]
Implicit feedback, such as user clicks, serves as the primary data source for modern recommender systems.<n>We propose SAID (Semantics-Aware Implicit Denoising), a framework that leverages semantic consistency between user interests and item content to identify and downweight potentially noisy interactions.<n>Experiments on two real-world datasets demonstrate that SAID consistently improves recommendation performance, achieving up to 2.2% relative improvement in AUC over strong baselines.
arXiv Detail & Related papers (2026-02-17T04:58:21Z) - Rethinking Purity and Diversity in Multi-Behavior Sequential Recommendation from the Frequency Perspective [48.60281642851056]
In recommendation systems, users often exhibit multiple behaviors, such as browsing, clicking, and purchasing.<n>Some behavior data will also bring inevitable noise to the modeling of user interests.<n>These studies indicate that low-frequency information tends to be valuable and reliable, while high-frequency information is often associated with noise.
arXiv Detail & Related papers (2025-08-28T04:55:02Z) - iHHO-SMOTe: A Cleansed Approach for Handling Outliers and Reducing Noise to Improve Imbalanced Data Classification [0.0]
Classifying imbalanced datasets remains a significant challenge in machine learning.<n>Synthetic Minority Over-sampling Technique (SMOTE) generates new instances for the under-represented minority class.<n>A proposed approach, iHHO-SMOTe, addresses the limitations of SMOTE by first cleansing the data from noise points.
arXiv Detail & Related papers (2025-04-17T11:17:53Z) - Variational Bayesian Personalized Ranking [39.24591060825056]
Variational BPR is a novel and easily implementable learning objective that integrates likelihood optimization, noise reduction, and popularity debiasing.<n>We introduce an attention-based latent interest prototype contrastive mechanism, replacing instance-level contrastive learning, to effectively reduce noise from problematic samples.<n> Empirically, we demonstrate the effectiveness of Variational BPR on popular backbone recommendation models.
arXiv Detail & Related papers (2025-03-14T04:22:01Z) - Large Language Model Enhanced Hard Sample Identification for Denoising Recommendation [4.297249011611168]
Implicit feedback is often used to build recommender systems.
Previous studies have attempted to alleviate this by identifying noisy samples based on their diverged patterns.
We propose a Large Language Model Enhanced Hard Sample Denoising framework.
arXiv Detail & Related papers (2024-09-16T14:57:09Z) - Impact of Preference Noise on the Alignment Performance of Generative Language Models [31.64856885517905]
We study the impact of preference noise on the alignment performance in two tasks (summarization and dialogue generation)
We find that the alignment performance can be highly sensitive to the noise rates in the preference data.
To mitigate the impact of noise, confidence-based data filtering shows significant benefit when certain types of noise are present.
arXiv Detail & Related papers (2024-04-15T14:21:53Z) - Adaptive Differential Privacy in Federated Learning: A Priority-Based
Approach [0.0]
Federated learning (FL) develops global models without direct access to local datasets.
DP offers a framework that gives a privacy guarantee by adding certain amounts of noise to parameters.
We propose adaptive noise addition in FL which decides the value of injected noise based on features' relative importance.
arXiv Detail & Related papers (2024-01-04T03:01:15Z) - FedDiv: Collaborative Noise Filtering for Federated Learning with Noisy
Labels [99.70895640578816]
Federated learning with noisy labels (F-LNL) aims at seeking an optimal server model via collaborative distributed learning.
We present FedDiv to tackle the challenges of F-LNL. Specifically, we propose a global noise filter called Federated Noise Filter.
arXiv Detail & Related papers (2023-12-19T15:46:47Z) - An Experimental Study on Private Aggregation of Teacher Ensemble
Learning for End-to-End Speech Recognition [51.232523987916636]
Differential privacy (DP) is one data protection avenue to safeguard user information used for training deep models by imposing noisy distortion on privacy data.
In this work, we extend PATE learning to work with dynamic patterns, namely speech, and perform one very first experimental study on ASR to avoid acoustic data leakage.
arXiv Detail & Related papers (2022-10-11T16:55:54Z) - On Dynamic Noise Influence in Differentially Private Learning [102.6791870228147]
Private Gradient Descent (PGD) is a commonly used private learning framework, which noises based on the Differential protocol.
Recent studies show that emphdynamic privacy schedules can improve at the final iteration, yet yet theoreticals of the effectiveness of such schedules remain limited.
This paper provides comprehensive analysis of noise influence in dynamic privacy schedules to answer these critical questions.
arXiv Detail & Related papers (2021-01-19T02:04:00Z) - RDP-GAN: A R\'enyi-Differential Privacy based Generative Adversarial
Network [75.81653258081435]
Generative adversarial network (GAN) has attracted increasing attention recently owing to its impressive ability to generate realistic samples with high privacy protection.
However, when GANs are applied on sensitive or private training examples, such as medical or financial records, it is still probable to divulge individuals' sensitive and private information.
We propose a R'enyi-differentially private-GAN (RDP-GAN), which achieves differential privacy (DP) in a GAN by carefully adding random noises on the value of the loss function during training.
arXiv Detail & Related papers (2020-07-04T09:51:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.