Data Encoding For Healthcare Data Democratisation and Information
Leakage Prevention
- URL: http://arxiv.org/abs/2305.03710v1
- Date: Fri, 5 May 2023 17:50:50 GMT
- Title: Data Encoding For Healthcare Data Democratisation and Information
Leakage Prevention
- Authors: Anshul Thakur, Tingting Zhu, Vinayak Abrol, Jacob Armstrong, Yujiang
Wang, David A. Clifton
- Abstract summary: This paper argues that irreversible data encoding can provide an effective solution to achieve data democratization.
It exploits random projections and random quantum encoding to realize this framework for dense and longitudinal or time-series data.
Experimental evaluation highlights that models trained on encoded time-series data effectively uphold the information bottleneck principle.
- Score: 23.673071967945358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The lack of data democratization and information leakage from trained models
hinder the development and acceptance of robust deep learning-based healthcare
solutions. This paper argues that irreversible data encoding can provide an
effective solution to achieve data democratization without violating the
privacy constraints imposed on healthcare data and clinical models. An ideal
encoding framework transforms the data into a new space where it is
imperceptible to a manual or computational inspection. However, encoded data
should preserve the semantics of the original data such that deep learning
models can be trained effectively. This paper hypothesizes the characteristics
of the desired encoding framework and then exploits random projections and
random quantum encoding to realize this framework for dense and longitudinal or
time-series data. Experimental evaluation highlights that models trained on
encoded time-series data effectively uphold the information bottleneck
principle and hence, exhibit lesser information leakage from trained models.
Related papers
- Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - TSynD: Targeted Synthetic Data Generation for Enhanced Medical Image Classification [0.011037620731410175]
This work aims to guide the generative model to synthesize data with high uncertainty.
We alter the feature space of the autoencoder through an optimization process.
We improve the robustness against test time data augmentations and adversarial attacks on several classifications tasks.
arXiv Detail & Related papers (2024-06-25T11:38:46Z) - How Good Are Synthetic Medical Images? An Empirical Study with Lung
Ultrasound [0.3312417881789094]
Adding synthetic training data using generative models offers a low-cost method to deal with the data scarcity challenge.
We show that training with both synthetic and real data outperforms training with real data alone.
arXiv Detail & Related papers (2023-10-05T15:42:53Z) - AI Model Disgorgement: Methods and Choices [127.54319351058167]
We introduce a taxonomy of possible disgorgement methods that are applicable to modern machine learning systems.
We investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.
arXiv Detail & Related papers (2023-04-07T08:50:18Z) - PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels [59.66777287810985]
We introduce information-theoretic scores for privacy and utility, which quantify the average performance of an unfaithful user.
We then theoretically characterize primitives in building families of encoding schemes that motivate the use of random deep neural networks.
arXiv Detail & Related papers (2023-03-31T18:03:53Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Reconstructing Training Data from Model Gradient, Provably [68.21082086264555]
We reconstruct the training samples from a single gradient query at a randomly chosen parameter value.
As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy.
arXiv Detail & Related papers (2022-12-07T15:32:22Z) - Distributed learning optimisation of Cox models can leak patient data:
Risks and solutions [0.0]
This paper demonstrates that the optimisation of a Cox survival model can lead to patient data leakage.
We suggest a way to optimise and validate a Cox model that avoids these problems in a secure way.
arXiv Detail & Related papers (2022-04-12T14:56:20Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Generative Low-bitwidth Data Free Quantization [44.613912463011545]
We propose Generative Low-bitwidth Data Free Quantization (GDFQ) to remove the data dependence burden.
With the help of generated data, we can quantize a model by learning knowledge from the pre-trained model.
Our method achieves much higher accuracy on 4-bit quantization than the existing data free quantization method.
arXiv Detail & Related papers (2020-03-07T16:38:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.