Personalized Language Model Learning on Text Data Without User Identifiers
- URL: http://arxiv.org/abs/2501.06062v1
- Date: Fri, 10 Jan 2025 15:46:19 GMT
- Title: Personalized Language Model Learning on Text Data Without User Identifiers
- Authors: Yucheng Ding, Yangwenjian Tan, Xiangyu Liu, Chaoyue Niu, Fandong Meng, Jie Zhou, Ning Liu, Fan Wu, Guihai Chen,
- Abstract summary: We propose to let each mobile device maintain a user-specific distribution to dynamically generate user embeddings.
To prevent the cloud from tracking users via uploaded embeddings, the local distributions of different users should either be derived from a linearly dependent space.
Evaluation on both public and industrial datasets reveals a remarkable improvement in accuracy from incorporating anonymous user embeddings.
- Score: 79.36212347601223
- License:
- Abstract: In many practical natural language applications, user data are highly sensitive, requiring anonymous uploads of text data from mobile devices to the cloud without user identifiers. However, the absence of user identifiers restricts the ability of cloud-based language models to provide personalized services, which are essential for catering to diverse user needs. The trivial method of replacing an explicit user identifier with a static user embedding as model input still compromises data anonymization. In this work, we propose to let each mobile device maintain a user-specific distribution to dynamically generate user embeddings, thereby breaking the one-to-one mapping between an embedding and a specific user. We further theoretically demonstrate that to prevent the cloud from tracking users via uploaded embeddings, the local distributions of different users should either be derived from a linearly dependent space to avoid identifiability or be close to each other to prevent accurate attribution. Evaluation on both public and industrial datasets using different language models reveals a remarkable improvement in accuracy from incorporating anonymous user embeddings, while preserving real-time inference requirement.
Related papers
- Personalized Federated Collaborative Filtering: A Variational AutoEncoder Approach [49.63614966954833]
Federated Collaborative Filtering (FedCF) is an emerging field focused on developing a new recommendation framework with preserving privacy.
Existing FedCF methods typically combine distributed Collaborative Filtering (CF) algorithms with privacy-preserving mechanisms, and then preserve personalized information into a user embedding vector.
This paper proposes a novel personalized FedCF method by preserving users' personalized information into a latent variable and a neural model simultaneously.
arXiv Detail & Related papers (2024-08-16T05:49:14Z) - Learning User Embeddings from Human Gaze for Personalised Saliency Prediction [12.361829928359136]
We present a novel method to extract user embeddings from pairs of natural images and corresponding saliency maps.
At the core of our method is a Siamese convolutional neural encoder that learns the user embeddings by contrasting the image and personal saliency map pairs of different users.
arXiv Detail & Related papers (2024-03-20T14:58:40Z) - Perennial Semantic Data Terms of Use for Decentralized Web [14.831528850463373]
We propose a novel formal description of Data Terms of Use (DToU)
Users and applications specify their own parts of the DToU policy with local knowledge.
This constitutes a perennial'' DToU language, where the policy authoring only occurs once.
arXiv Detail & Related papers (2024-03-12T12:18:20Z) - User Inference Attacks on Large Language Models [26.616016510555088]
Fine-tuning is a common and effective method for tailoring large language models (LLMs) to specialized tasks and applications.
We study the privacy implications of fine-tuning LLMs on user data.
arXiv Detail & Related papers (2023-10-13T17:24:52Z) - X2T: Training an X-to-Text Typing Interface with Online Learning from
User Feedback [83.95599156217945]
We focus on assistive typing applications in which a user cannot operate a keyboard, but can supply other inputs.
Standard methods train a model on a fixed dataset of user inputs, then deploy a static interface that does not learn from its mistakes.
We investigate a simple idea that would enable such interfaces to improve over time, with minimal additional effort from the user.
arXiv Detail & Related papers (2022-03-04T00:07:20Z) - Federated Learning of User Verification Models Without Sharing
Embeddings [73.27015469166166]
Federated User Verification (FedUV) is a framework in which users jointly learn a set of vectors and maximize the correlation of their instance embeddings with a secret linear combination of those vectors.
We show that choosing the linear combinations from the codewords of an error-correcting code allows users to collaboratively train the model without revealing their embedding vectors.
arXiv Detail & Related papers (2021-04-18T08:51:39Z) - Federated Learning of User Authentication Models [69.93965074814292]
We propose Federated User Authentication (FedUA), a framework for privacy-preserving training of machine learning models.
FedUA adopts federated learning framework to enable a group of users to jointly train a model without sharing the raw inputs.
We show our method is privacy-preserving, scalable with number of users, and allows new users to be added to training without changing the output layer.
arXiv Detail & Related papers (2020-07-09T08:04:38Z) - Unsupervised Model Personalization while Preserving Privacy and
Scalability: An Open Problem [55.21502268698577]
This work investigates the task of unsupervised model personalization, adapted to continually evolving, unlabeled local user images.
We provide a novel Dual User-Adaptation framework (DUA) to explore the problem.
This framework flexibly disentangles user-adaptation into model personalization on the server and local data regularization on the user device.
arXiv Detail & Related papers (2020-03-30T09:35:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.