Data Leakage via Access Patterns of Sparse Features in Deep
Learning-based Recommendation Systems
- URL: http://arxiv.org/abs/2212.06264v1
- Date: Mon, 12 Dec 2022 22:05:46 GMT
- Title: Data Leakage via Access Patterns of Sparse Features in Deep
Learning-based Recommendation Systems
- Authors: Hanieh Hashemi, Wenjie Xiong, Liu Ke, Kiwan Maeng, Murali Annavaram,
G. Edward Suh, Hsien-Hsin S. Lee
- Abstract summary: State-of-the-art recommendation models rely on sparse and dense features to represent users' profile information.
This paper explores the private information that may be learned by tracking a recommendation model's sparse feature access patterns.
- Score: 10.657479921108287
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Online personalized recommendation services are generally hosted in the cloud
where users query the cloud-based model to receive recommended input such as
merchandise of interest or news feed. State-of-the-art recommendation models
rely on sparse and dense features to represent users' profile information and
the items they interact with. Although sparse features account for 99% of the
total model size, there was not enough attention paid to the potential
information leakage through sparse features. These sparse features are employed
to track users' behavior, e.g., their click history, object interactions, etc.,
potentially carrying each user's private information. Sparse features are
represented as learned embedding vectors that are stored in large tables, and
personalized recommendation is performed by using a specific user's sparse
feature to index through the tables. Even with recently-proposed methods that
hides the computation happening in the cloud, an attacker in the cloud may be
able to still track the access patterns to the embedding tables. This paper
explores the private information that may be learned by tracking a
recommendation model's sparse feature access patterns. We first characterize
the types of attacks that can be carried out on sparse features in
recommendation models in an untrusted cloud, followed by a demonstration of how
each of these attacks leads to extracting users' private information or
tracking users by their behavior over time.
Related papers
- Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Learning User Embeddings from Human Gaze for Personalised Saliency Prediction [12.361829928359136]
We present a novel method to extract user embeddings from pairs of natural images and corresponding saliency maps.
At the core of our method is a Siamese convolutional neural encoder that learns the user embeddings by contrasting the image and personal saliency map pairs of different users.
arXiv Detail & Related papers (2024-03-20T14:58:40Z) - Rethinking Privacy in Machine Learning Pipelines from an Information
Flow Control Perspective [16.487545258246932]
Modern machine learning systems use models trained on ever-growing corpora.
metadata such as ownership, access control, or licensing information is ignored during training.
We take an information flow control perspective to describe machine learning systems.
arXiv Detail & Related papers (2023-11-27T13:14:39Z) - Independent Distribution Regularization for Private Graph Embedding [55.24441467292359]
Graph embeddings are susceptible to attribute inference attacks, which allow attackers to infer private node attributes from the learned graph embeddings.
To address these concerns, privacy-preserving graph embedding methods have emerged.
We propose a novel approach called Private Variational Graph AutoEncoders (PVGAE) with the aid of independent distribution penalty as a regularization term.
arXiv Detail & Related papers (2023-08-16T13:32:43Z) - PointCaM: Cut-and-Mix for Open-Set Point Cloud Learning [72.07350827773442]
We propose to solve open-set point cloud learning using a novel Point Cut-and-Mix mechanism.
We use the Unknown-Point Simulator to simulate out-of-distribution data in the training stage.
The Unknown-Point Estimator module learns to exploit the point cloud's feature context for discriminating the known and unknown data.
arXiv Detail & Related papers (2022-12-05T03:53:51Z) - Learning Location from Shared Elevation Profiles in Fitness Apps: A
Privacy Perspective [14.886240385518716]
We study the extent to which elevation profiles can be used to predict the location of users.
We devise three plausible threat settings under which the city or borough of the targets can be predicted.
We achieve a prediction success rate ranging from 59.59% to 99.80%.
arXiv Detail & Related papers (2022-10-27T15:15:13Z) - Federated Learning of User Authentication Models [69.93965074814292]
We propose Federated User Authentication (FedUA), a framework for privacy-preserving training of machine learning models.
FedUA adopts federated learning framework to enable a group of users to jointly train a model without sharing the raw inputs.
We show our method is privacy-preserving, scalable with number of users, and allows new users to be added to training without changing the output layer.
arXiv Detail & Related papers (2020-07-09T08:04:38Z) - Privacy-Preserving Image Features via Adversarial Affine Subspace
Embeddings [72.68801373979943]
Many computer vision systems require users to upload image features to the cloud for processing and storage.
We propose a new privacy-preserving feature representation.
Compared to the original features, our approach makes it significantly more difficult for an adversary to recover private information.
arXiv Detail & Related papers (2020-06-11T17:29:48Z) - TIPRDC: Task-Independent Privacy-Respecting Data Crowdsourcing Framework
for Deep Learning with Anonymized Intermediate Representations [49.20701800683092]
We present TIPRDC, a task-independent privacy-respecting data crowdsourcing framework with anonymized intermediate representation.
The goal of this framework is to learn a feature extractor that can hide the privacy information from the intermediate representations; while maximally retaining the original information embedded in the raw data for the data collector to accomplish unknown learning tasks.
arXiv Detail & Related papers (2020-05-23T06:21:26Z) - Privacy Shadow: Measuring Node Predictability and Privacy Over Time [1.2437226707039446]
We propose the privacy shadow for measuring how long a user remains predictive from an arbitrary time within the network.
We demonstrate that the length of the privacy shadow can be predicted for individual users in three real-world datasets.
arXiv Detail & Related papers (2020-04-04T23:31:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.