Scalable Discovery and Continuous Inventory of Personal Data at Rest in
Cloud Native Systems
- URL: http://arxiv.org/abs/2209.10412v1
- Date: Fri, 9 Sep 2022 10:45:34 GMT
- Title: Scalable Discovery and Continuous Inventory of Personal Data at Rest in
Cloud Native Systems
- Authors: Elias Gr\"unewald and Leonard Schurbert
- Abstract summary: Cloud native systems are processing large amounts of personal data through numerous and possibly multi-paradigmatic data stores.
From a privacy engineering perspective, a core challenge is to keep track of all exact locations, where personal data is being stored.
We present Teiresias, comprising i) a workflow pattern for scalable discovery of personal data at rest, and ii) a cloud native system architecture and open source prototype implementation of said workflow pattern.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cloud native systems are processing large amounts of personal data through
numerous and possibly multi-paradigmatic data stores (e.g., relational and
non-relational databases). From a privacy engineering perspective, a core
challenge is to keep track of all exact locations, where personal data is being
stored, as required by regulatory frameworks such as the European General Data
Protection Regulation. In this paper, we present Teiresias, comprising i) a
workflow pattern for scalable discovery of personal data at rest, and ii) a
cloud native system architecture and open source prototype implementation of
said workflow pattern. To this end, we enable a continuous inventory of
personal data featuring transparency and accountability following
DevOps/DevPrivOps practices. In particular, we scope version-controlled
Infrastructure as Code definitions, cloud-based storages, and how to integrate
the process into CI/CD pipelines. Thereafter, we provide iii) a comparative
performance evaluation demonstrating both appropriate execution times for
real-world settings, and a promising personal data detection accuracy
outperforming existing proprietary tools in public clouds.
Related papers
- Object as a Service: Simplifying Cloud-Native Development through Serverless Object Abstraction [1.7416288134936873]
We propose a new paradigm, known as Object as a Service (O) that encapsulates application data and functions into the cloud object abstraction.
O relieves developers from resource and data management burden while offering built-in optimization features.
We develop a platform named Oparaca that offers state abstraction for structured and unstructured data with consistency and fault-tolerant guarantees.
arXiv Detail & Related papers (2024-08-09T06:55:00Z) - PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a.
Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns.
We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z) - ST-DPGAN: A Privacy-preserving Framework for Spatiotemporal Data Generation [19.18074489351738]
We propose a Graph-based model for generating privacy-protected data.
Experiments conducted on three real-worldtemporal datasets validate the efficacy of our model.
The prediction model trained on our generated data maintains a competitive edge compared to the model trained on the original data.
arXiv Detail & Related papers (2024-06-04T04:43:54Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - Hawk: DevOps-driven Transparency and Accountability in Cloud Native
Systems [0.0]
Transparency is one of the most important principles of modern privacy regulations.
Data controllers must provide data subjects with precise information about the collection, processing, storage, and transfer of personal data.
arXiv Detail & Related papers (2023-06-04T22:09:42Z) - Privacy-Preserving Machine Learning for Collaborative Data Sharing via
Auto-encoder Latent Space Embeddings [57.45332961252628]
Privacy-preserving machine learning in data-sharing processes is an ever-critical task.
This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data.
arXiv Detail & Related papers (2022-11-10T17:36:58Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Outsourcing Training without Uploading Data via Efficient Collaborative
Open-Source Sampling [49.87637449243698]
Traditional outsourcing requires uploading device data to the cloud server.
We propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources.
We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training.
arXiv Detail & Related papers (2022-10-23T00:12:18Z) - Reasoning over Public and Private Data in Retrieval-Based Systems [29.515915401413334]
State-of-the-art systems explicitly retrieve relevant information to a user question from a background corpus before producing an answer.
While today's retrieval systems assume the corpus is fully accessible, users are often unable or unwilling to expose their private data to entities hosting public data.
We first define the PUBLIC-PRIVATE AUTOREGRESSIVE Information RETRIEVAL (PAIR) privacy framework for the novel retrieval setting over multiple privacy scopes.
arXiv Detail & Related papers (2022-03-14T13:08:51Z) - On-Device Learning with Cloud-Coordinated Data Augmentation for Extreme
Model Personalization in Recommender Systems [39.41506296601779]
We propose a new device-cloud collaborative learning framework, called CoDA, to break the dilemmas of purely cloud-based learning and on-device learning.
CoDA retrieves similar samples from the cloud's global pool to augment each user's local dataset to train the recommendation model.
Online A/B testing results show the remarkable performance improvement of CoDA over both cloud-based learning without model personalization and on-device training without data augmentation.
arXiv Detail & Related papers (2022-01-24T04:59:04Z) - Unsupervised Model Personalization while Preserving Privacy and
Scalability: An Open Problem [55.21502268698577]
This work investigates the task of unsupervised model personalization, adapted to continually evolving, unlabeled local user images.
We provide a novel Dual User-Adaptation framework (DUA) to explore the problem.
This framework flexibly disentangles user-adaptation into model personalization on the server and local data regularization on the user device.
arXiv Detail & Related papers (2020-03-30T09:35:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.