Protecting Sensitive Tabular Data in Hybrid Clouds
- URL: http://arxiv.org/abs/2312.01354v1
- Date: Sun, 3 Dec 2023 11:20:24 GMT
- Title: Protecting Sensitive Tabular Data in Hybrid Clouds
- Authors: Maya Anderson, Gidon Gershinsky, Eliot Salant, Salvador Garcia,
- Abstract summary: Regulated industries, such as Healthcare and Finance, are starting to move parts of their data and workloads to the public cloud.
We address the security and performance challenges of big data analytics using a hybrid cloud in a real-life use case from a hospital.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Regulated industries, such as Healthcare and Finance, are starting to move parts of their data and workloads to the public cloud. However, they are still reluctant to trust the public cloud with their most sensitive records, and hence leave them in their premises, leveraging the hybrid cloud architecture. We address the security and performance challenges of big data analytics using a hybrid cloud in a real-life use case from a hospital. In this use case, the hospital collects sensitive patient data and wants to run analytics on it in order to lower antibiotics resistance, a significant challenge in healthcare. We show that it is possible to run large-scale analytics on data that is securely stored in the public cloud encrypted using Apache Parquet Modular Encryption (PME), without significant performance losses even if the secret encryption keys are stored on-premises. PME is a standard mechanism for data encryption and key management, not specific to any public cloud, and therefore helps prevent vendor lock-in. It also provides privacy and integrity guarantees, and enables granular access control to the data. We also present an innovation in PME for lowering the performance hit incurred by calls to the Key Management Service. Our solution therefore enables protecting large amounts of sensitive data in hybrid clouds and still allows to efficiently gain valuable insights from it.
Related papers
- CCA-Secure Key-Aggregate Proxy Re-Encryption for Secure Cloud Storage [1.4610685586329806]
Data protection in cloud storage is the key to the survival of the cloud industry.
Proxy Re-Encryption schemes enable users to convert their ciphertext into others ciphertext by using a re-encryption key.
Recently, we lowered the key storage cost of C-PREs to constant size and proposed the first Key-Aggregate Proxy Re-Encryption scheme.
arXiv Detail & Related papers (2024-10-10T17:02:49Z) - K-Nearest Neighbor Classification over Semantically Secure Encrypted Relational Data [0.0]
In public cloud environments while data is encrypted, the cloud service provider typically controls the encryption keys.
This situation makes traditional privacy-preserving classification systems inadequate.
We propose a secure k nearest neighbor classification algorithm for encrypted, outsourced data.
arXiv Detail & Related papers (2024-07-05T19:44:17Z) - Ciphertext-Only Attack on a Secure $k$-NN Computation on Cloud [0.0]
encryption can prevent unauthorized access, data breaches, and the resultant financial loss, reputation damage, and legal issues.
Sanyashi et al. proposed an encryption scheme to facilitate privacy-preserving $k$-NN computation on the cloud.
We give an efficient algorithm and empirically demonstrate that their encryption scheme is vulnerable to the ciphertext-only attack (COA)
arXiv Detail & Related papers (2024-03-14T03:53:01Z) - A Review on Searchable Encryption Functionality and the Evaluation of Homomorphic Encryption [0.0]
Businesses, such as Netflix and PayPal, rely on the Cloud for data storage, computing power, and other services.
There are security and privacy concerns regarding the Cloud.
To protect data in the Cloud, it should be encrypted before it is uploaded.
This paper reviews the functionality of Searchable Encryption, mostly related to Cloud services, in the years 2019 to 2023.
arXiv Detail & Related papers (2023-12-22T04:48:00Z) - Stop Uploading Test Data in Plain Text: Practical Strategies for
Mitigating Data Contamination by Evaluation Benchmarks [70.39633252935445]
Data contamination has become prevalent and challenging with the rise of models pretrained on large automatically-crawled corpora.
For closed models, the training data becomes a trade secret, and even for open models, it is not trivial to detect contamination.
We propose three strategies that can make a difference: (1) Test data made public should be encrypted with a public key and licensed to disallow derivative distribution; (2) demand training exclusion controls from closed API holders, and protect your test data by refusing to evaluate without them; and (3) avoid data which appears with its solution on the internet, and release the web-page context of internet-derived
arXiv Detail & Related papers (2023-05-17T12:23:38Z) - THE-X: Privacy-Preserving Transformer Inference with Homomorphic
Encryption [112.02441503951297]
Privacy-preserving inference of transformer models is on the demand of cloud service users.
We introduce $textitTHE-X$, an approximation approach for transformers, which enables privacy-preserving inference of pre-trained models.
arXiv Detail & Related papers (2022-06-01T03:49:18Z) - BigBird: Big Data Storage and Analytics at Scale in Hybrid Cloud [0.0]
This paper showcases our approach in designing a scalable big data storage and analytics management framework using BigQuery in Google Cloud Platform.
Although the paper discusses the framework implementation in Google Cloud Platform, it can easily be applied to all major cloud providers.
arXiv Detail & Related papers (2022-03-22T05:42:46Z) - Reinforcement Learning on Encrypted Data [58.39270571778521]
We present a preliminary, experimental study of how a DQN agent trained on encrypted states performs in environments with discrete and continuous state spaces.
Our results highlight that the agent is still capable of learning in small state spaces even in presence of non-deterministic encryption, but performance collapses in more complex environments.
arXiv Detail & Related papers (2021-09-16T21:59:37Z) - NeuraCrypt: Hiding Private Health Data via Random Neural Networks for
Public Training [64.54200987493573]
We propose NeuraCrypt, a private encoding scheme based on random deep neural networks.
NeuraCrypt encodes raw patient data using a randomly constructed neural network known only to the data-owner.
We show that NeuraCrypt achieves competitive accuracy to non-private baselines on a variety of x-ray tasks.
arXiv Detail & Related papers (2021-06-04T13:42:21Z) - Second layer data governance for permissioned blockchains: the privacy
management challenge [58.720142291102135]
In pandemic situations, such as the COVID-19 and Ebola outbreak, the action related to sharing health data is crucial to avoid the massive infection and decrease the number of deaths.
In this sense, permissioned blockchain technology emerges to empower users to get their rights providing data ownership, transparency, and security through an immutable, unified, and distributed database ruled by smart contracts.
arXiv Detail & Related papers (2020-10-22T13:19:38Z) - Faster Secure Data Mining via Distributed Homomorphic Encryption [108.77460689459247]
Homomorphic Encryption (HE) is receiving more and more attention recently for its capability to do computations over the encrypted field.
We propose a novel general distributed HE-based data mining framework towards one step of solving the scaling problem.
We verify the efficiency and effectiveness of our new framework by testing over various data mining algorithms and benchmark data-sets.
arXiv Detail & Related papers (2020-06-17T18:14:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.