Puda: Private User Dataset Agent for User-Sovereign and Privacy-Preserving Personalized AI
- URL: http://arxiv.org/abs/2602.08268v2
- Date: Tue, 10 Feb 2026 05:00:53 GMT
- Title: Puda: Private User Dataset Agent for User-Sovereign and Privacy-Preserving Personalized AI
- Authors: Akinori Maeda, Yuto Sekiya, Sota Sugimura, Tomoya Asai, Yu Tsuda, Kohei Ikeda, Hiroshi Fujii, Kohei Watanabe,
- Abstract summary: We propose Puda, a user-sovereign architecture that aggregates data across services and enables client-side management.<n>Puda allows users to control data sharing at three privacy levels: (i) Detailed Browsing History, (ii) Extracted Keywords, and (iii) Predefined Category Subsets.<n>Our results show that Puda enables effective multi-granularity management, offering practical choices to mitigate the privacy-personalization trade-off.
- Score: 0.35132824436572685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personal data centralization among dominant platform providers including search engines, social networking services, and e-commerce has created siloed ecosystems that restrict user sovereignty, thereby impeding data use across services. Meanwhile, the rapid proliferation of Large Language Model (LLM)-based agents has intensified demand for highly personalized services that require the dynamic provision of diverse personal data. This presents a significant challenge: balancing the utilization of such data with privacy protection. To address this challenge, we propose Puda (Private User Dataset Agent), a user-sovereign architecture that aggregates data across services and enables client-side management. Puda allows users to control data sharing at three privacy levels: (i) Detailed Browsing History, (ii) Extracted Keywords, and (iii) Predefined Category Subsets. We implemented Puda as a browser-based system that serves as a common platform across diverse services and evaluated it through a personalized travel planning task. Our results show that providing Predefined Category Subsets achieves 97.2% of the personalization performance (evaluated via an LLM-as-a-Judge framework across three criteria) obtained when sharing Detailed Browsing History. These findings demonstrate that Puda enables effective multi-granularity management, offering practical choices to mitigate the privacy-personalization trade-off. Overall, Puda provides an AI-native foundation for user sovereignty, empowering users to safely leverage the full potential of personalized AI.
Related papers
- Your Privacy Depends on Others: Collusion Vulnerabilities in Individual Differential Privacy [50.66105844449181]
Individual Differential Privacy (iDP) promises users control over their privacy, but this promise can be broken in practice.<n>We reveal a previously overlooked vulnerability in sampling-based iDP mechanisms.<n>We propose $(varepsilon_i,_i,overline)$-iDP a privacy contract that uses $$-divergences to provide users with a hard upper bound on their excess vulnerability.
arXiv Detail & Related papers (2026-01-19T10:26:12Z) - Towards Usable Privacy Management for IoT TAPs: Deriving Privacy Clusters and Preference Profiles [1.2744523252873352]
IoT Trigger-Action Platforms (TAPs) typically offer coarse-grained permission controls.<n>This paper contributes to usable privacy management for TAPs by deriving privacy clusters and profiles for different types of users.
arXiv Detail & Related papers (2025-11-14T12:08:58Z) - A User-Centric, Privacy-Preserving, and Verifiable Ecosystem for Personal Data Management and Utilization [2.9204726853775664]
This paper introduces a novel decentralized, privacy-preserving architecture that handles heterogeneous personal information.<n>Unlike traditional models, our system grants users complete data ownership and control, allowing them to selectively share information without compromising privacy.
arXiv Detail & Related papers (2025-06-27T20:05:46Z) - PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data [76.21047984886273]
Personalization is critical in AI assistants, particularly in the context of private AI models that work with individual users.<n>Due to the sensitive nature of such data, there are no publicly available datasets that allow us to assess an AI model's ability to understand users.<n>We introduce a synthetic data generation pipeline that creates diverse, realistic user profiles and private documents simulating human activities.
arXiv Detail & Related papers (2025-02-28T00:43:35Z) - Towards Personal Data Sharing Autonomy:A Task-driven Data Capsule Sharing System [5.076862984714449]
We introduce a novel task-driven personal data sharing system based on the data capsule paradigm realizing personal data sharing autonomy.
Specifically, we present a tamper-resistant data capsule encapsulation method, where the data capsule is the minimal unit for independent and secure personal data storage and sharing.
arXiv Detail & Related papers (2024-09-27T05:13:33Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - Tapping into Privacy: A Study of User Preferences and Concerns on
Trigger-Action Platforms [0.0]
The Internet of Things (IoT) devices are rapidly increasing in popularity, with more individuals using Internet-connected devices that continuously monitor their activities.
This work explores privacy concerns and expectations of end-users related to Trigger-Action platforms (TAPs) in the context of the Internet of Things (IoT)
TAPs allow users to customize their smart environments by creating rules that trigger actions based on specific events or conditions.
arXiv Detail & Related papers (2023-08-11T14:25:01Z) - Protecting User Privacy in Online Settings via Supervised Learning [69.38374877559423]
We design an intelligent approach to online privacy protection that leverages supervised learning.
By detecting and blocking data collection that might infringe on a user's privacy, we can restore a degree of digital privacy to the user.
arXiv Detail & Related papers (2023-04-06T05:20:16Z) - Cross-Network Social User Embedding with Hybrid Differential Privacy
Guarantees [81.6471440778355]
We propose a Cross-network Social User Embedding framework, namely DP-CroSUE, to learn the comprehensive representations of users in a privacy-preserving way.
In particular, for each heterogeneous social network, we first introduce a hybrid differential privacy notion to capture the variation of privacy expectations for heterogeneous data types.
To further enhance user embeddings, a novel cross-network GCN embedding model is designed to transfer knowledge across networks through those aligned users.
arXiv Detail & Related papers (2022-09-04T06:22:37Z) - Efficient User-Centric Privacy-Friendly and Flexible Wearable Data Aggregation and Sharing [9.532148238768213]
Wearable devices can offer services to individuals and the public.
Wearable data collected by cloud providers may pose privacy risks.
We propose a novel, efficient, user-centric, privacy-friendly, and flexible data aggregation and sharing scheme, named SAMA.
arXiv Detail & Related papers (2022-03-01T13:51:52Z) - Unsupervised Model Personalization while Preserving Privacy and
Scalability: An Open Problem [55.21502268698577]
This work investigates the task of unsupervised model personalization, adapted to continually evolving, unlabeled local user images.
We provide a novel Dual User-Adaptation framework (DUA) to explore the problem.
This framework flexibly disentangles user-adaptation into model personalization on the server and local data regularization on the user device.
arXiv Detail & Related papers (2020-03-30T09:35:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.