Data Watermarking for Sequential Recommender Systems
- URL: http://arxiv.org/abs/2411.12989v1
- Date: Wed, 20 Nov 2024 02:34:21 GMT
- Title: Data Watermarking for Sequential Recommender Systems
- Authors: Sixiao Zhang, Cheng Long, Wei Yuan, Hongxu Chen, Hongzhi Yin,
- Abstract summary: We study the problem of data watermarking for sequential recommender systems.
dataset watermarking protects the ownership of the entire dataset, and user watermarking safeguards the data of individual users.
Our approach involves randomly selecting unpopular items to create a watermark sequence, which is then inserted into normal users' interaction sequences.
- Score: 52.207721219147814
- License:
- Abstract: In the era of large foundation models, data has become a crucial component for building high-performance AI systems. As the demand for high-quality and large-scale data continues to rise, data copyright protection is attracting increasing attention. In this work, we explore the problem of data watermarking for sequential recommender systems, where a watermark is embedded into the target dataset and can be detected in models trained on that dataset. We address two specific challenges: dataset watermarking, which protects the ownership of the entire dataset, and user watermarking, which safeguards the data of individual users. We systematically define these problems and present a method named DWRS to address them. Our approach involves randomly selecting unpopular items to create a watermark sequence, which is then inserted into normal users' interaction sequences. Extensive experiments on five representative sequential recommendation models and three benchmark datasets demonstrate the effectiveness of DWRS in protecting data copyright while preserving model utility.
Related papers
- Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models [79.65071553905021]
We propose Data Advisor, a method for generating data that takes into account the characteristics of the desired dataset.
Data Advisor monitors the status of the generated data, identifies weaknesses in the current dataset, and advises the next iteration of data generation.
arXiv Detail & Related papers (2024-10-07T17:59:58Z) - Adaptive and Robust Watermark for Generative Tabular Data [8.566821590631907]
We propose a flexible and robust watermarking mechanism for generative tabular data.
We show theoretically and empirically that the watermarked datasets have negligible impact on the data quality and downstream utility.
arXiv Detail & Related papers (2024-09-23T04:37:30Z) - Watermarking Recommender Systems [52.207721219147814]
We introduce Autoregressive Out-of-distribution Watermarking (AOW), a novel technique tailored specifically for recommender systems.
Our approach entails selecting an initial item and querying it through the oracle model, followed by the selection of subsequent items with small prediction scores.
To assess the efficacy of the watermark, the model is tasked with predicting the subsequent item given a truncated watermark sequence.
arXiv Detail & Related papers (2024-07-17T06:51:24Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Multi-Resolution Diffusion for Privacy-Sensitive Recommender Systems [2.812395851874055]
We introduce a Score-based Diffusion Recommendation Module (SDRM), which captures the intricate patterns of real-world datasets required for training highly accurate recommender systems.
SDRM allows for the generation of synthetic data that can replace existing datasets to preserve user privacy, or augment existing datasets to address excessive data sparsity.
Our method outperforms competing baselines such as generative adversarial networks, variational autoencoders, and recently proposed diffusion models in synthesizing various datasets to replace or augment the original data by an average improvement of 4.30% in Recall@k and 4.65% in NDCG@k.
arXiv Detail & Related papers (2023-11-06T19:52:55Z) - Domain Watermark: Effective and Harmless Dataset Copyright Protection is
Closed at Hand [96.26251471253823]
backdoor-based dataset ownership verification (DOV) is currently the only feasible approach to protect the copyright of open-source datasets.
We make watermarked models (trained on the protected dataset) correctly classify some hard' samples that will be misclassified by the benign model.
arXiv Detail & Related papers (2023-10-09T11:23:05Z) - Did You Train on My Dataset? Towards Public Dataset Protection with
Clean-Label Backdoor Watermarking [54.40184736491652]
We propose a backdoor-based watermarking approach that serves as a general framework for safeguarding public-available data.
By inserting a small number of watermarking samples into the dataset, our approach enables the learning model to implicitly learn a secret function set by defenders.
This hidden function can then be used as a watermark to track down third-party models that use the dataset illegally.
arXiv Detail & Related papers (2023-03-20T21:54:30Z) - Removing Backdoor-Based Watermarks in Neural Networks with Limited Data [26.050649487499626]
Trading deep models is highly demanded and lucrative nowadays.
naive trading schemes typically involve potential risks related to copyright and trustworthiness issues.
We propose a novel backdoor-based watermark removal framework using limited data, dubbed WILD.
arXiv Detail & Related papers (2020-08-02T06:25:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.