Related papers: Robust Representation Learning for Privacy-Preserving Machine Learning: A Multi-Objective Autoencoder Approach

Robust Representation Learning for Privacy-Preserving Machine Learning: A Multi-Objective Autoencoder Approach

URL: http://arxiv.org/abs/2309.04427v1
Date: Fri, 8 Sep 2023 16:41:25 GMT
Title: Robust Representation Learning for Privacy-Preserving Machine Learning: A Multi-Objective Autoencoder Approach
Authors: Sofiane Ouaari, Ali Burak \"Unal, Mete Akg\"un, Nico Pfeifer
Abstract summary: We propose a robust representation learning framework for privacy-preserving machine learning (ppML) Our method centers on training autoencoders in a multi-objective manner and then concatenating the latent and learned features from the encoding part as the encoded form of our data. With our proposed framework, we can share our data and use third party tools without being under the threat of revealing its original form.
Score: 0.9831489366502302
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Several domains increasingly rely on machine learning in their applications. The resulting heavy dependence on data has led to the emergence of various laws and regulations around data ethics and privacy and growing awareness of the need for privacy-preserving machine learning (ppML). Current ppML techniques utilize methods that are either purely based on cryptography, such as homomorphic encryption, or that introduce noise into the input, such as differential privacy. The main criticism given to those techniques is the fact that they either are too slow or they trade off a model s performance for improved confidentiality. To address this performance reduction, we aim to leverage robust representation learning as a way of encoding our data while optimizing the privacy-utility trade-off. Our method centers on training autoencoders in a multi-objective manner and then concatenating the latent and learned features from the encoding part as the encoded form of our data. Such a deep learning-powered encoding can then safely be sent to a third party for intensive training and hyperparameter tuning. With our proposed framework, we can share our data and use third party tools without being under the threat of revealing its original form. We empirically validate our results on unimodal and multimodal settings, the latter following a vertical splitting system and show improved performance over state-of-the-art.

Related papers

Continual Pretraining on Encrypted Synthetic Data for Privacy-Preserving LLMs [16.21971760443489]
We propose an entity-based framework that synthesizes encrypted training data to protect personally identifiable information (PII)<n>Our approach constructs a weighted entity graph to guide data synthesis and applies deterministic encryption to PII entities.<n>Our results on limited-scale datasets demonstrate that our pretrained models outperform base models and ensure PII security.
arXiv Detail & Related papers (2026-01-09T08:44:07Z)
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning [50.45435841411193]
Code Language Models (CLMs) exhibit unintended memorization of sensitive training data, enabling verbatim reproduction of confidential information when specifically prompted.<n>We introduce CodeEraser, an advanced variant that selectively unlearns sensitive memorized segments in code while preserving the structural integrity and functional correctness of the surrounding code.
arXiv Detail & Related papers (2025-09-17T07:12:35Z)
Compressive Meta-Learning [49.300635370079874]
Compressive learning is a framework that enables efficient processing by using random, non-linear features.<n>We propose a framework that meta-learns both the encoding and decoding stages of compressive learning methods.<n>We explore multiple applications -- including neural network-based compressive PCA, compressive ridge regression, compressive k-means, and autoencoders.
arXiv Detail & Related papers (2025-08-14T22:08:06Z)
Private Training & Data Generation by Clustering Embeddings [74.00687214400021]
Differential privacy (DP) provides a robust framework for protecting individual data.<n>We introduce a novel principled method for DP synthetic image embedding generation.<n> Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy.
arXiv Detail & Related papers (2025-06-20T00:17:14Z)
A Selective Homomorphic Encryption Approach for Faster Privacy-Preserving Federated Learning [2.942616054218564]
Federated learning is a machine learning method that supports training models on decentralized devices or servers. This approach is especially useful in healthcare, as it enables training on sensitive data without needing to share them. We propose a new approach that employs selective encryption, homomorphic encryption, differential privacy, and bit-wise scrambling to minimize data leakage.
arXiv Detail & Related papers (2025-01-22T14:37:44Z)
FLUE: Federated Learning with Un-Encrypted model weights [0.0]
Federated learning enables devices to collaboratively train a shared model while keeping training data locally stored. Recent research emphasizes using encrypted model parameters during training. This paper introduces a novel federated learning algorithm, leveraging coded local gradients without encryption.
arXiv Detail & Related papers (2024-07-26T14:04:57Z)
FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners. FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks. We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
Learning in the Dark: Privacy-Preserving Machine Learning using Function Approximation [1.8907108368038215]
Learning in the Dark is a privacy-preserving machine learning model that can classify encrypted images with high accuracy. It is capable of performing high accuracy predictions by performing computations directly on encrypted data.
arXiv Detail & Related papers (2023-09-15T06:45:58Z)
PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels [59.66777287810985]
We introduce information-theoretic scores for privacy and utility, which quantify the average performance of an unfaithful user. We then theoretically characterize primitives in building families of encoding schemes that motivate the use of random deep neural networks.
arXiv Detail & Related papers (2023-03-31T18:03:53Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
Privacy-Preserving Chaotic Extreme Learning Machine with Fully Homomorphic Encryption [5.010425616264462]
We propose a Chaotic Extreme Learning Machine and its encrypted form using Fully Homomorphic Encryption. Our proposed method has performed either better or similar to the Traditional Extreme Learning Machine on most of the datasets.
arXiv Detail & Related papers (2022-08-04T11:29:52Z)
Multimodal Masked Autoencoders Learn Transferable Representations [127.35955819874063]
We propose a simple and scalable network architecture, the Multimodal Masked Autoencoder (M3AE) M3AE learns a unified encoder for both vision and language data via masked token prediction. We provide an empirical study of M3AE trained on a large-scale image-text dataset, and find that M3AE is able to learn generalizable representations that transfer well to downstream tasks.
arXiv Detail & Related papers (2022-05-27T19:09:42Z)
Privacy-Preserving Wavelet Wavelet Neural Network with Fully Homomorphic Encryption [5.010425616264462]
Privacy-Preserving Machine Learning (PPML) aims to protect the privacy and provide security to the data used in building Machine Learning models. We propose a fully homomorphic encrypted wavelet neural network to protect privacy and at the same time not compromise on the efficiency of the model.
arXiv Detail & Related papers (2022-05-26T10:40:31Z)
Homomorphic Encryption and Federated Learning based Privacy-Preserving CNN Training: COVID-19 Detection Use-Case [0.41998444721319217]
This paper proposes a privacy-preserving federated learning algorithm for medical data using homomorphic encryption. The proposed algorithm uses a secure multi-party computation protocol to protect the deep learning model from the adversaries.
arXiv Detail & Related papers (2022-04-16T08:38:35Z)
User-Level Privacy-Preserving Federated Learning: Analysis and Performance Optimization [77.43075255745389]
Federated learning (FL) is capable of preserving private data from mobile terminals (MTs) while training the data into useful models. From a viewpoint of information theory, it is still possible for a curious server to infer private information from the shared models uploaded by MTs. We propose a user-level differential privacy (UDP) algorithm by adding artificial noise to the shared models before uploading them to servers.
arXiv Detail & Related papers (2020-02-29T10:13:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.