Related papers: Enhancing Scalability of Metric Differential Privacy via Secret Dataset Partitioning and Benders Decomposition

Enhancing Scalability of Metric Differential Privacy via Secret Dataset Partitioning and Benders Decomposition

URL: http://arxiv.org/abs/2405.04344v2
Date: Thu, 9 May 2024 04:36:12 GMT
Title: Enhancing Scalability of Metric Differential Privacy via Secret Dataset Partitioning and Benders Decomposition
Authors: Chenxi Qiu,
Abstract summary: Metric Differential Privacy (mDP) extends the concept of Differential Privacy (DP) to serve as a new paradigm of data. It is designed to protect secret data represented in general metric space, such as text data encoded as word embeddings or geo-location data on the road network or grid maps.
Score: 1.283608820493284
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Metric Differential Privacy (mDP) extends the concept of Differential Privacy (DP) to serve as a new paradigm of data perturbation. It is designed to protect secret data represented in general metric space, such as text data encoded as word embeddings or geo-location data on the road network or grid maps. To derive an optimal data perturbation mechanism under mDP, a widely used method is linear programming (LP), which, however, might suffer from a polynomial explosion of decision variables, rendering it impractical in large-scale mDP. In this paper, our objective is to develop a new computation framework to enhance the scalability of the LP-based mDP. Considering the connections established by the mDP constraints among the secret records, we partition the original secret dataset into various subsets. Building upon the partition, we reformulate the LP problem for mDP and solve it via Benders Decomposition, which is composed of two stages: (1) a master program to manage the perturbation calculation across subsets and (2) a set of subproblems, each managing the perturbation derivation within a subset. Our experimental results on multiple datasets, including geo-location data in the road network/grid maps, text data, and synthetic data, underscore our proposed mechanism's superior scalability and efficiency.

Related papers

Second-Order Convergence in Private Stochastic Non-Convex Optimization [28.00987194971941]
We investigate the problem of finding second-order stationary points (SOS) in differentially private (DP) non-dimensional identification optimization.<n>Existing methods suffer from inaccurate convergence error due to gradient variance in the saddle point escape analysis.<n>We develop a new DP algorithm that rectifies the convergence error reported in prior work.
arXiv Detail & Related papers (2025-05-21T15:25:23Z)
DP-2Stage: Adapting Language Models as Differentially Private Tabular Data Generators [47.86275136491794]
We propose DP-2Stage, a two-stage fine-tuning framework for differentially private data generation. Our empirical results show that this approach improves performance across various settings and metrics.
arXiv Detail & Related papers (2024-12-03T14:10:09Z)
Minimally Supervised Learning using Topological Projections in Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs) Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU) Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z)
Images in Discrete Choice Modeling: Addressing Data Isomorphism in Multi-Modality Inputs [77.54052164713394]
This paper explores the intersection of Discrete Choice Modeling (DCM) and machine learning. We investigate the consequences of embedding high-dimensional image data that shares isomorphic information with traditional tabular inputs within a DCM framework.
arXiv Detail & Related papers (2023-12-22T14:33:54Z)
Mode-wise Principal Subspace Pursuit and Matrix Spiked Covariance Model [13.082805815235975]
We introduce a novel framework called Mode-wise Principal Subspace Pursuit (MOP-UP) to extract hidden variations in both the row and column dimensions for matrix data. The effectiveness and practical merits of the proposed framework are demonstrated through experiments on both simulated and real datasets.
arXiv Detail & Related papers (2023-07-02T13:59:47Z)
Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server. We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered. Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z)
Federated Coordinate Descent for Privacy-Preserving Multiparty Linear Regression [0.5049057348282932]
We present Federated Coordinate Descent, a new distributed scheme called FCD, to address this issue securely under multiparty scenarios. Specifically, through secure aggregation and added perturbations, our scheme guarantees that: (1) no local information is leaked to other parties, and (2) global model parameters are not exposed to cloud servers. We show that the FCD scheme fills the gap of multiparty secure Coordinate Descent methods and is applicable for general linear regressions, including linear, ridge and lasso regressions.
arXiv Detail & Related papers (2022-09-16T03:53:46Z)
BATS: Best Action Trajectory Stitching [22.75880303352508]
We introduce an algorithm which forms a tabular Markov Decision Process (MDP) over the logged data by adding new transitions to the dataset. We prove that this property allows one to make upper and lower bounds on the value function up to appropriate distance metrics. We show an example in which simply behavior cloning the optimal policy of the MDP created by our algorithm avoids this problem.
arXiv Detail & Related papers (2022-04-26T01:48:32Z)
Consistency and Diversity induced Human Motion Segmentation [231.36289425663702]
We propose a novel Consistency and Diversity induced human Motion (CDMS) algorithm. Our model factorizes the source and target data into distinct multi-layer feature spaces. A multi-mutual learning strategy is carried out to reduce the domain gap between the source and target data.
arXiv Detail & Related papers (2022-02-10T06:23:56Z)
Privacy-preserving Data Sharing on Vertically Partitioned Data [16.167363414383576]
We introduce a differentially private method for generating synthetic data from vertically partitioned data. We train a mixture model over partitioned data using variational inference. We rigorously define the privacy guarantees with respect to the different players in the system.
arXiv Detail & Related papers (2020-10-19T08:10:34Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF. It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.