Related papers: Data value estimation on private gradients

Data value estimation on private gradients

URL: http://arxiv.org/abs/2412.17008v1
Date: Sun, 22 Dec 2024 13:15:51 GMT
Title: Data value estimation on private gradients
Authors: Zijian Zhou, Xinyi Xu, Daniela Rus, Bryan Kian Hsiang Low,
Abstract summary: For gradient-based machine learning (ML) methods, the de facto differential privacy technique is perturbing the gradients with random noise.<n>Data valuation attributes the ML performance to the training data and is widely used in privacy-aware applications that require enforcing DP.<n>We show that the answer is no with the default approach of injecting i.i.d.random noise to the gradients because the estimation uncertainty of the data value estimation paradoxically linearly scales with more estimation budget.<n>We propose to instead inject carefully correlated noise to provably remove the linear scaling of estimation uncertainty w.r.t.the budget.
Score: 84.966853523107
License: http://creativecommons.org/licenses/by/4.0/
Abstract: For gradient-based machine learning (ML) methods commonly adopted in practice such as stochastic gradient descent, the de facto differential privacy (DP) technique is perturbing the gradients with random Gaussian noise. Data valuation attributes the ML performance to the training data and is widely used in privacy-aware applications that require enforcing DP such as data pricing, collaborative ML, and federated learning (FL). Can existing data valuation methods still be used when DP is enforced via gradient perturbations? We show that the answer is no with the default approach of injecting i.i.d.~random noise to the gradients because the estimation uncertainty of the data value estimation paradoxically linearly scales with more estimation budget, producing estimates almost like random guesses. To address this issue, we propose to instead inject carefully correlated noise to provably remove the linear scaling of estimation uncertainty w.r.t.~the budget. We also empirically demonstrate that our method gives better data value estimates on various ML tasks and is applicable to use cases including dataset valuation and~FL.

Related papers

Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications. Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility. We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z)
Noise-Aware Differentially Private Variational Inference [5.4619385369457225]
Differential privacy (DP) provides robust privacy guarantees for statistical inference, but this can lead to unreliable results and biases in downstream applications. We propose a novel method for noise-aware approximate Bayesian inference based on gradient variational inference. We also propose a more accurate evaluation method for noise-aware posteriors.
arXiv Detail & Related papers (2024-10-25T08:18:49Z)
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions [34.99034454081842]
Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited. In this work, we focus on influence functions, a popular gradient-based data valuation method, and significantly improve its scalability. We also introduce LogIX, a software package that can transform existing training code into data valuation code with minimal effort.
arXiv Detail & Related papers (2024-05-22T19:39:05Z)
Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching [22.461036967440723]
We study smoothed distance to data as an uncertainty metric, and claim that it has two beneficial properties. We show these gradients can be efficiently learned with score-matching techniques. We propose Score-Guided Planning (SGP) to enable first-order planning in high-dimensional problems.
arXiv Detail & Related papers (2023-06-24T23:40:58Z)
Transfer Learning for Causal Effect Estimation [12.630663215983706]
We present a Transfer Causal Learning framework to improve causal effect estimation accuracy in limited data. Our method is subsequently extended to real data and generates meaningful insights consistent with medical literature.
arXiv Detail & Related papers (2023-05-16T03:13:55Z)
Statistical Inference with Stochastic Gradient Methods under $\phi$-mixing Data [9.77185962310918]
We propose a mini-batch SGD estimator for statistical inference when the data is $phi$-mixing. The confidence intervals are constructed using an associated mini-batch SGD procedure. The proposed method is memory-efficient and easy to implement in practice.
arXiv Detail & Related papers (2023-02-24T16:16:43Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation. We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)
Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties. Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z)
Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users. An adversary may still be able to infer the private training data by attacking the released model. Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
TraDE: Transformers for Density Estimation [101.20137732920718]
TraDE is a self-attention-based architecture for auto-regressive density estimation. We present a suite of tasks such as regression using generated samples, out-of-distribution detection, and robustness to noise in the training data.
arXiv Detail & Related papers (2020-04-06T07:32:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.