Outsourced Privacy-Preserving Feature Selection Based on Fully Homomorphic Encryption
- URL: http://arxiv.org/abs/2505.12869v1
- Date: Mon, 19 May 2025 08:55:56 GMT
- Title: Outsourced Privacy-Preserving Feature Selection Based on Fully Homomorphic Encryption
- Authors: Koki Wakiyama, Tomohiro I, Hiroshi Sakamoto,
- Abstract summary: Feature selection is a technique that extracts a meaningful subset from a set of features in training data.<n>This study proposes a privacy-preserving model for feature selection.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Feature selection is a technique that extracts a meaningful subset from a set of features in training data. When the training data is large-scale, appropriate feature selection enables the removal of redundant features, which can improve generalization performance, accelerate the training process, and enhance the interpretability of the model. This study proposes a privacy-preserving computation model for feature selection. Generally, when the data owner and analyst are the same, there is no need to conceal the private information. However, when they are different parties or when multiple owners exist, an appropriate privacy-preserving framework is required. Although various private feature selection algorithms, they all require two or more computing parties and do not guarantee security in environments where no external party can be fully trusted. To address this issue, we propose the first outsourcing algorithm for feature selection using fully homomorphic encryption. Compared to a prior two-party algorithm, our result improves the time and space complexity O(kn^2) to O(kn log^3 n) and O(kn), where k and n denote the number of features and data samples, respectively. We also implemented the proposed algorithm and conducted comparative experiments with the naive one. The experimental result shows the efficiency of our method even with small datasets.
Related papers
- On the (In)Significance of Feature Selection in High-Dimensional Datasets [0.5266869303483376]
We test the null hypothesis of using randomly selected features to compare against features selected by FS algorithms.<n>Our results show that FS on high-dimensional datasets (in particular gene expression) in classification tasks is not useful.
arXiv Detail & Related papers (2025-08-05T15:58:31Z) - Scalable Private Partition Selection via Adaptive Weighting [66.09199304818928]
In a private set union, users hold subsets of items from an unbounded universe.<n>The goal is to output as many items as possible from the union of the users' sets while maintaining user-level differential privacy.<n>We propose an algorithm for this problem, MaximumDegree (MAD), which adaptively reroutes weight from items with weight far above the threshold needed for privacy to items with smaller weight.
arXiv Detail & Related papers (2025-02-13T01:27:11Z) - Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines.<n>We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z) - Oracle-Efficient Differentially Private Learning with Public Data [21.771932463130252]
We present the first computationally efficient, algorithms to provably leverage public data to learn privately whenever a function class is learnable non-privately.
We provide specialized algorithms with improved sample complexities in the special cases when the function class is convex or when the task is binary classification.
arXiv Detail & Related papers (2024-02-13T23:40:50Z) - Privacy-Optimized Randomized Response for Sharing Multi-Attribute Data [1.1510009152620668]
We propose a privacy-optimized randomized response that guarantees the strongest privacy in sharing multi-attribute data.
We also present an efficient algorithm for constructing a near-optimal attribute mechanism.
Our methods provide significantly stronger privacy guarantees for the entire dataset than the existing method.
arXiv Detail & Related papers (2024-02-12T11:34:42Z) - On Differential Privacy and Adaptive Data Analysis with Bounded Space [76.10334958368618]
We study the space complexity of the two related fields of differential privacy and adaptive data analysis.
We show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy.
The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries.
arXiv Detail & Related papers (2023-02-11T14:45:31Z) - Optimal Algorithms for Mean Estimation under Local Differential Privacy [55.32262879188817]
We show that PrivUnit achieves the optimal variance among a large family of locally private randomizers.
We also develop a new variant of PrivUnit based on the Gaussian distribution which is more amenable to mathematical analysis and enjoys the same optimality guarantees.
arXiv Detail & Related papers (2022-05-05T06:43:46Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Privacy preserving n-party scalar product protocol [0.0]
Privacy-preserving machine learning enables the training of models on decentralized datasets without the need to reveal the data.
The privacy preserving scalar product protocol, which enables the dot product of vectors without revealing them, is one popular example for its versatility.
We propose a generalization of the protocol for an arbitrary number of parties, based on an existing two-party method.
arXiv Detail & Related papers (2021-12-17T11:14:53Z) - New Oracle-Efficient Algorithms for Private Synthetic Data Release [52.33506193761153]
We present three new algorithms for constructing differentially private synthetic data.
The algorithms satisfy differential privacy even in the worst case.
Compared to the state-of-the-art method High-Dimensional Matrix Mechanism citeMcKennaMHM18, our algorithms provide better accuracy in the large workload.
arXiv Detail & Related papers (2020-07-10T15:46:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.