On PAC Learning Halfspaces in Non-interactive Local Privacy Model with
Public Unlabeled Data
- URL: http://arxiv.org/abs/2209.08319v1
- Date: Sat, 17 Sep 2022 12:19:20 GMT
- Title: On PAC Learning Halfspaces in Non-interactive Local Privacy Model with
Public Unlabeled Data
- Authors: Jinyan Su and Jinhui Xu and Di Wang
- Abstract summary: We study the problem of PAC learning halfspaces in the non-interactive local differential model (NLDP)
We show that it is possible to achieve sample complexities that are only linear in the dimension and in other terms for both private and public data.
- Score: 18.820311737806456
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study the problem of PAC learning halfspaces in the
non-interactive local differential privacy model (NLDP). To breach the barrier
of exponential sample complexity, previous results studied a relaxed setting
where the server has access to some additional public but unlabeled data. We
continue in this direction. Specifically, we consider the problem under the
standard setting instead of the large margin setting studied before. Under
different mild assumptions on the underlying data distribution, we propose two
approaches that are based on the Massart noise model and self-supervised
learning and show that it is possible to achieve sample complexities that are
only linear in the dimension and polynomial in other terms for both private and
public data, which significantly improve the previous results. Our methods
could also be used for other private PAC learning problems.
Related papers
- Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines.
We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z) - Towards Split Learning-based Privacy-Preserving Record Linkage [49.1574468325115]
Split Learning has been introduced to facilitate applications where user data privacy is a requirement.
In this paper, we investigate the potentials of Split Learning for Privacy-Preserving Record Matching.
arXiv Detail & Related papers (2024-09-02T09:17:05Z) - Empirical Mean and Frequency Estimation Under Heterogeneous Privacy: A Worst-Case Analysis [5.755004576310333]
Differential Privacy (DP) is the current gold-standard for measuring privacy.
We consider the problems of empirical mean estimation for univariate data and frequency estimation for categorical data, subject to heterogeneous privacy constraints.
We prove some optimality results, under both PAC error and mean-squared error, for our proposed algorithms and demonstrate superior performance over other baseline techniques experimentally.
arXiv Detail & Related papers (2024-07-15T22:46:02Z) - LLM-based Privacy Data Augmentation Guided by Knowledge Distillation
with a Distribution Tutor for Medical Text Classification [67.92145284679623]
We propose a DP-based tutor that models the noised private distribution and controls samples' generation with a low privacy cost.
We theoretically analyze our model's privacy protection and empirically verify our model.
arXiv Detail & Related papers (2024-02-26T11:52:55Z) - Optimal Locally Private Nonparametric Classification with Public Data [2.631955426232593]
We investigate the problem of public data assisted non-interactive Local Differentially Private (LDP) learning with a focus on non-parametric classification.
Under the posterior drift assumption, we derive the mini-max optimal convergence rate with LDP constraint.
We present a novel approach, the locally differentially private classification tree, which attains the mini-max optimal convergence rate.
arXiv Detail & Related papers (2023-11-19T16:35:01Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - PILLAR: How to make semi-private learning more effective [12.292092677396347]
In Semi-Supervised Semi-Private (SP) learning, the learner has access to both public unlabelled and private labelled data.
We propose a computationally efficient algorithm that achieves significantly lower private labelled sample complexity and can be efficiently run on real-world datasets.
arXiv Detail & Related papers (2023-06-06T18:45:05Z) - On Differential Privacy and Adaptive Data Analysis with Bounded Space [76.10334958368618]
We study the space complexity of the two related fields of differential privacy and adaptive data analysis.
We show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy.
The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries.
arXiv Detail & Related papers (2023-02-11T14:45:31Z) - On Covariate Shift of Latent Confounders in Imitation and Reinforcement
Learning [69.48387059607387]
We consider the problem of using expert data with unobserved confounders for imitation and reinforcement learning.
We analyze the limitations of learning from confounded expert data with and without external reward.
We validate our claims empirically on challenging assistive healthcare and recommender system simulation tasks.
arXiv Detail & Related papers (2021-10-13T07:31:31Z) - On the Sample Complexity of Adversarial Multi-Source PAC Learning [46.24794665486056]
In a single-source setting, an adversary with the power to corrupt a fixed fraction of the training data can prevent PAC-learnability.
We show that, surprisingly, the same is not true in the multi-source setting, where the adversary can arbitrarily corrupt a fixed fraction of the data sources.
Our results also show that in a cooperative learning setting sharing data with other parties has provable benefits, even if some participants are malicious.
arXiv Detail & Related papers (2020-02-24T17:19:04Z) - Efficient, Noise-Tolerant, and Private Learning via Boosting [15.62988331732388]
We show how to construct noise-tolerant and private PAC learners for large-margin halfspaces.
This first bound illustrates a general methodology for obtaining PAC learners from privacy.
The second bound uses standard techniques to match the best known sample complexity for differentially private learning of large-margin halfspaces.
arXiv Detail & Related papers (2020-02-04T03:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.