Related papers: Quantifying Classifier Utility under Local Differential Privacy

Quantifying Classifier Utility under Local Differential Privacy

URL: http://arxiv.org/abs/2507.02727v1
Date: Thu, 03 Jul 2025 15:42:10 GMT
Title: Quantifying Classifier Utility under Local Differential Privacy
Authors: Ye Zheng, Yidan Hu,
Abstract summary: Local differential privacy (LDP) provides a quantifiable privacy guarantee for personal data by introducing perturbation at the data source.<n>This paper presents a framework for theoretically quantifying classifier utility under LDP mechanisms.
Score: 5.90975025491779
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Local differential privacy (LDP) provides a rigorous and quantifiable privacy guarantee for personal data by introducing perturbation at the data source. However, quantifying the impact of these perturbations on classifier utility remains a theoretical challenge, particularly for complex or black-box classifiers. This paper presents a framework for theoretically quantifying classifier utility under LDP mechanisms. The key insight is that LDP perturbation is concentrated around the original data with a specific probability, transforming utility analysis of the classifier into its robustness analysis in this concentrated region. Our framework connects the concentration analysis of LDP mechanisms with the robustness analysis of classifiers. It treats LDP mechanisms as general distributional functions and classifiers as black-box functions, thus applicable to any LDP mechanism and classifier. A direct application of our utility quantification is guiding the selection of LDP mechanisms and privacy parameters for a given classifier. Notably, our analysis shows that a piecewise-based mechanism leads to better utility compared to alternatives in common scenarios. Using this framework alongside two novel refinement techniques, we conduct case studies on utility quantification for typical mechanism-classifier combinations. The results demonstrate that our theoretical utility quantification aligns closely with empirical observations, particularly when classifiers operate in lower-dimensional input spaces.

Related papers

Causal Mechanism Estimation in Multi-Sensor Systems Across Multiple Domains [34.02899134427814]
We present a novel three-step approach to inferring causal mechanisms from heterogeneous data collected across multiple domains.<n>By leveraging the principle of Causal Transfer Learning (CTL), CICME is able to reliably detect domain-invariant causal mechanisms when provided with sufficient samples.<n>We show that CICME leverages the benefits of applying causal discovery on the pooled data and repeatedly on data from individual domains, and it even outperforms both baseline methods under certain scenarios.
arXiv Detail & Related papers (2025-07-23T10:35:37Z)
Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models [68.57424628540907]
Large language models (LLMs) often develop learned mechanisms specialized to specific datasets.<n>We introduce a fine-tuning approach designed to enhance generalization by identifying and pruning neurons associated with dataset-specific mechanisms.<n>Our method employs Integrated Gradients to quantify each neuron's influence on high-confidence predictions, pinpointing those that disproportionately contribute to dataset-specific performance.
arXiv Detail & Related papers (2025-07-12T08:10:10Z)
Optimal Piecewise-based Mechanism for Collecting Bounded Numerical Data under Local Differential Privacy [5.258253745672122]
Local differential privacy (LDP) has been shown as a framework providing provable individual privacy, even when the third-party platform is untrusted.<n>This paper investigates the optimal design of piecewise-based mechanisms to maximize data utility under LDP.
arXiv Detail & Related papers (2025-05-21T13:01:41Z)
Partial Transportability for Domain Generalization [56.37032680901525]
Building on the theory of partial identification and transportability, this paper introduces new results for bounding the value of a functional of the target distribution.<n>Our contribution is to provide the first general estimation technique for transportability problems.<n>We propose a gradient-based optimization scheme for making scalable inferences in practice.
arXiv Detail & Related papers (2025-03-30T22:06:37Z)
Differentially Private Range Queries with Correlated Input Perturbation [9.169888822435757]
We propose a class of differentially private mechanisms for linear queries that leverage correlated input perturbations to achieve unbiasedness, consistency, statistical transparency, and control over utility requirements. Our theoretical and empirical analysis demonstrates that we achieve near-optimal utility, effectively compete with other methods, and retain all the favorable statistical properties discussed earlier.
arXiv Detail & Related papers (2024-02-10T23:42:05Z)
Boosted Control Functions: Distribution generalization and invariance in confounded models [10.503777692702952]
We introduce a strong notion of invariance that allows for distribution generalization even in the presence of nonlinear, non-identifiable structural functions.<n>We propose the ControlTwicing algorithm to estimate the Boosted Control Function (BCF) using flexible machine-learning techniques.
arXiv Detail & Related papers (2023-10-09T15:43:46Z)
Parametric Classification for Generalized Category Discovery: A Baseline Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem. We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z)
Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm. Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z)
Controlling for sparsity in sparse factor analysis models: adaptive latent feature sharing for piecewise linear dimensionality reduction [2.896192909215469]
We propose a simple and tractable parametric feature allocation model which can address key limitations of current latent feature decomposition techniques. We derive a novel adaptive Factor analysis (aFA), as well as, an adaptive probabilistic principle component analysis (aPPCA) capable of flexible structure discovery and dimensionality reduction. We show that aPPCA and aFA can infer interpretable high level features both when applied on raw MNIST and when applied for interpreting autoencoder features.
arXiv Detail & Related papers (2020-06-22T16:09:11Z)
Repulsive Mixture Models of Exponential Family PCA for Clustering [127.90219303669006]
The mixture extension of exponential family principal component analysis ( EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA. The traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering. In this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
arXiv Detail & Related papers (2020-04-07T04:07:29Z)
Target-Embedding Autoencoders for Supervised Representation Learning [111.07204912245841]
This paper analyzes a framework for improving generalization in a purely supervised setting, where the target space is high-dimensional. We motivate and formalize the general framework of target-embedding autoencoders (TEA) for supervised prediction, learning intermediate latent representations jointly optimized to be both predictable from features as well as predictive of targets.
arXiv Detail & Related papers (2020-01-23T02:37:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.