Related papers: Federated Nonparametric Hypothesis Testing with Differential Privacy Constraints: Optimal Rates and Adaptive Tests

Federated Nonparametric Hypothesis Testing with Differential Privacy Constraints: Optimal Rates and Adaptive Tests

URL: http://arxiv.org/abs/2406.06749v1
Date: Mon, 10 Jun 2024 19:25:19 GMT
Title: Federated Nonparametric Hypothesis Testing with Differential Privacy Constraints: Optimal Rates and Adaptive Tests
Authors: T. Tony Cai, Abhinav Chakraborty, Lasse Vuursteen,
Abstract summary: Federated learning has attracted significant recent attention due to its applicability across a wide range of settings where data is collected and analyzed across disparate locations. We study federated nonparametric goodness-of-fit testing in the white-noise-with-drift model under distributed differential privacy (DP) constraints.
Score: 5.3595271893779906
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Federated learning has attracted significant recent attention due to its applicability across a wide range of settings where data is collected and analyzed across disparate locations. In this paper, we study federated nonparametric goodness-of-fit testing in the white-noise-with-drift model under distributed differential privacy (DP) constraints. We first establish matching lower and upper bounds, up to a logarithmic factor, on the minimax separation rate. This optimal rate serves as a benchmark for the difficulty of the testing problem, factoring in model characteristics such as the number of observations, noise level, and regularity of the signal class, along with the strictness of the $(\epsilon,\delta)$-DP requirement. The results demonstrate interesting and novel phase transition phenomena. Furthermore, the results reveal an interesting phenomenon that distributed one-shot protocols with access to shared randomness outperform those without access to shared randomness. We also construct a data-driven testing procedure that possesses the ability to adapt to an unknown regularity parameter over a large collection of function classes with minimal additional cost, all while maintaining adherence to the same set of DP constraints.

Related papers

Low coordinate degree algorithms II: Categorical signals and generalized stochastic block models [2.4889993472438383]
We study when low coordinate degree functions can test for the presence of categorical structure in high-dimensional data. This complements the first paper of this series, which studied the power of LCDF in testing for continuous structure.
arXiv Detail & Related papers (2024-12-30T18:34:36Z)
Minimax Optimal Two-Sample Testing under Local Differential Privacy [3.3317825075368908]
We explore the trade-off between privacy and statistical utility in private two-sample testing under local differential privacy (LDP) We introduce private permutation tests using practical privacy mechanisms such as Laplace, discrete Laplace, and Google's RAPPOR. We study continuous data via binning and study its uniform separation rates under LDP over H"older and Besov smoothness classes.
arXiv Detail & Related papers (2024-11-13T22:44:25Z)
Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance [4.291589126905706]
In the AutoML domain, test accuracy is heralded as the quintessential metric for evaluating model efficacy. However, the reliability of test accuracy as the primary performance metric has been called into question. The distribution of hard samples between training and test sets affects the difficulty levels of those sets. We propose a benchmarking procedure for comparing hard sample identification methods.
arXiv Detail & Related papers (2024-09-22T11:38:14Z)
Minimax And Adaptive Transfer Learning for Nonparametric Classification under Distributed Differential Privacy Constraints [6.042269506496206]
We first establish the minimax misclassification rate, precisely characterizing the effects of privacy constraints, source samples, and target samples on classification accuracy. The results reveal interesting phase transition phenomena and highlight the intricate trade-offs between preserving privacy and achieving classification accuracy.
arXiv Detail & Related papers (2024-06-28T17:55:41Z)
Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples. Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance. We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z)
Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server. We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered. Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z)
Breaking the Spurious Causality of Conditional Generation via Fairness Intervention with Corrective Sampling [77.15766509677348]
Conditional generative models often inherit spurious correlations from the training dataset. This can result in label-conditional distributions that are imbalanced with respect to another latent attribute. We propose a general two-step strategy to mitigate this issue.
arXiv Detail & Related papers (2022-12-05T08:09:33Z)
MaxMatch: Semi-Supervised Learning with Worst-Case Consistency [149.03760479533855]
We propose a worst-case consistency regularization technique for semi-supervised learning (SSL) We present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately. Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants.
arXiv Detail & Related papers (2022-09-26T12:04:49Z)
Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs. Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems. In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z)
Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users. An adversary may still be able to infer the private training data by attacking the released model. Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
Minimax optimal goodness-of-fit testing for densities and multinomials under a local differential privacy constraint [3.265773263570237]
We consider the consequences of local differential privacy constraints on goodness-of-fit testing. We present a test that is adaptive to the smoothness parameter of the unknown density and remains minimax optimal up to a logarithmic factor.
arXiv Detail & Related papers (2020-02-11T08:41:05Z)
Fundamental Limits of Testing the Independence of Irrelevant Alternatives in Discrete Choice [9.13127392774573]
The Multinomial Logit (MNL) model and the Independence of Irrelevant Alternatives (IIA) are the most widely used tools of discrete choice. We show that any general test for IIA with low worst-case error would require a number of samples exponential in the number of alternatives of the choice problem. Our lower bounds are structure-dependent, and as a potential cause for optimism, we find that if one restricts the test of IIA to violations that can occur in a specific collection of choice sets, one obtains structure-dependent lower bounds that are much less pessimistic.
arXiv Detail & Related papers (2020-01-20T10:15:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.