Related papers: Privacy-Preserving Distributed Learning in the Analog Domain

Privacy-Preserving Distributed Learning in the Analog Domain

URL: http://arxiv.org/abs/2007.08803v1
Date: Fri, 17 Jul 2020 07:56:39 GMT
Title: Privacy-Preserving Distributed Learning in the Analog Domain
Authors: Mahdi Soleymani, Hessam Mahdavifar, A. Salman Avestimehr
Abstract summary: We consider the problem of distributed learning over data while keeping it private from the computational servers. We propose a novel algorithm to solve the problem when data is in the analog domain. We show how the proposed framework can be adopted to do computation tasks when data is represented using floating-point numbers.
Score: 23.67685616088422
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the critical problem of distributed learning over data while keeping it private from the computational servers. The state-of-the-art approaches to this problem rely on quantizing the data into a finite field, so that the cryptographic approaches for secure multiparty computing can then be employed. These approaches, however, can result in substantial accuracy losses due to fixed-point representation of the data and computation overflows. To address these critical issues, we propose a novel algorithm to solve the problem when data is in the analog domain, e.g., the field of real/complex numbers. We characterize the privacy of the data from both information-theoretic and cryptographic perspectives, while establishing a connection between the two notions in the analog domain. More specifically, the well-known connection between the distinguishing security (DS) and the mutual information security (MIS) metrics is extended from the discrete domain to the continues domain. This is then utilized to bound the amount of information about the data leaked to the servers in our protocol, in terms of the DS metric, using well-known results on the capacity of single-input multiple-output (SIMO) channel with correlated noise. It is shown how the proposed framework can be adopted to do computation tasks when data is represented using floating-point numbers. We then show that this leads to a fundamental trade-off between the privacy level of data and accuracy of the result. As an application, we also show how to train a machine learning model while keeping the data as well as the trained model private. Then numerical results are shown for experiments on the MNIST dataset. Furthermore, experimental advantages are shown comparing to fixed-point implementations over finite fields.

Related papers

Benchmarking Fraud Detectors on Private Graph Data [70.4654745317714]
Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs.<n>We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties.<n>Third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results.<n>We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results.
arXiv Detail & Related papers (2025-07-30T03:20:15Z)
Space-Efficient Private Estimation of Quantiles [2.934266338788597]
estimation of quantiles on data streams coming from communication networks, Internet of Things (IoT), and alike, is at the heart of important data processing applications.<n>Stream items may arrive at a very high rate and must be processed as quickly as possible and discarded, being their storage usually unfeasible.<n>We present the following algorithms for frugal estimation of a quantile: textscDP-Frugal-1U-L, textscDP-Frugal-1U-G, textscDP-Frugal-1U-$rho$
arXiv Detail & Related papers (2025-02-27T15:47:15Z)
Differentially Private Linear Regression with Linked Data [3.9325957466009203]
Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees. Recent work focuses on developing differentially private versions of individual statistical and machine learning tasks. We present two differentially private algorithms for linear regression with linked data.
arXiv Detail & Related papers (2023-08-01T21:00:19Z)
LAVA: Data Valuation without Pre-Specified Learning Algorithms [20.578106028270607]
We introduce a new framework that can value training data in a way that is oblivious to the downstream learning algorithm. We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets. We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions.
arXiv Detail & Related papers (2023-04-28T19:05:16Z)
Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network. Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed. As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z)
Learning to Bound Counterfactual Inference in Structural Causal Models from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm. The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources. It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z)
FedILC: Weighted Geometric Mean and Invariant Gradient Covariance for Federated Learning on Non-IID Data [69.0785021613868]
Federated learning is a distributed machine learning approach which enables a shared server model to learn by aggregating the locally-computed parameter updates with the training data from spatially-distributed client silos. We propose the Federated Invariant Learning Consistency (FedILC) approach, which leverages the gradient covariance and the geometric mean of Hessians to capture both inter-silo and intra-silo consistencies. This is relevant to various fields such as medical healthcare, computer vision, and the Internet of Things (IoT)
arXiv Detail & Related papers (2022-05-19T03:32:03Z)
Data-SUITE: Data-centric identification of in-distribution incongruous examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data. We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z)
A communication efficient distributed learning framework for smart environments [0.4898659895355355]
This paper proposes a distributed learning framework to move data analytics closer to where data is generated. Using distributed machine learning techniques, it is possible to drastically reduce the network overhead, while obtaining performance comparable to the cloud solution. The analysis also shows when each distributed learning approach is preferable, based on the specific distribution of the data on the nodes.
arXiv Detail & Related papers (2021-09-27T13:44:34Z)
Comprehensive Graph-conditional Similarity Preserving Network for Unsupervised Cross-modal Hashing [97.44152794234405]
Unsupervised cross-modal hashing (UCMH) has become a hot topic recently. In this paper, we devise a deep graph-neighbor coherence preserving network (DGCPN) DGCPN regulates comprehensive similarity preserving losses by exploiting three types of data similarities.
arXiv Detail & Related papers (2020-12-25T07:40:59Z)
Anonymizing Sensor Data on the Edge: A Representation Learning and Transformation Approach [4.920145245773581]
In this paper, we aim to examine the tradeoff between utility and privacy loss by learning low-dimensional representations that are useful for data obfuscation. We propose deterministic and probabilistic transformations in the latent space of a variational autoencoder to synthesize time series data. We show that it can anonymize data in real time on resource-constrained edge devices.
arXiv Detail & Related papers (2020-11-16T22:32:30Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
User-Level Privacy-Preserving Federated Learning: Analysis and Performance Optimization [77.43075255745389]
Federated learning (FL) is capable of preserving private data from mobile terminals (MTs) while training the data into useful models. From a viewpoint of information theory, it is still possible for a curious server to infer private information from the shared models uploaded by MTs. We propose a user-level differential privacy (UDP) algorithm by adding artificial noise to the shared models before uploading them to servers.
arXiv Detail & Related papers (2020-02-29T10:13:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.