Related papers: Instance-Optimal Private Density Estimation in the Wasserstein Distance

Instance-Optimal Private Density Estimation in the Wasserstein Distance

URL: http://arxiv.org/abs/2406.19566v1
Date: Thu, 27 Jun 2024 22:51:06 GMT
Title: Instance-Optimal Private Density Estimation in the Wasserstein Distance
Authors: Vitaly Feldman, Audra McMillan, Satchit Sivakumar, Kunal Talwar,
Abstract summary: Estimating the density of a distribution from samples is a fundamental problem in statistics. We study differentially private density estimation in the Wasserstein distance.
Score: 37.58527481568219
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Estimating the density of a distribution from samples is a fundamental problem in statistics. In many practical settings, the Wasserstein distance is an appropriate error metric for density estimation. For example, when estimating population densities in a geographic region, a small Wasserstein distance means that the estimate is able to capture roughly where the population mass is. In this work we study differentially private density estimation in the Wasserstein distance. We design and analyze instance-optimal algorithms for this problem that can adapt to easy instances. For distributions $P$ over $\mathbb{R}$, we consider a strong notion of instance-optimality: an algorithm that uniformly achieves the instance-optimal estimation rate is competitive with an algorithm that is told that the distribution is either $P$ or $Q_P$ for some distribution $Q_P$ whose probability density function (pdf) is within a factor of 2 of the pdf of $P$. For distributions over $\mathbb{R}^2$, we use a different notion of instance optimality. We say that an algorithm is instance-optimal if it is competitive with an algorithm that is given a constant-factor multiplicative approximation of the density of the distribution. We characterize the instance-optimal estimation rates in both these settings and show that they are uniformly achievable (up to polylogarithmic factors). Our approach for $\mathbb{R}^2$ extends to arbitrary metric spaces as it goes via hierarchically separated trees. As a special case our results lead to instance-optimal private learning in TV distance for discrete distributions.

Related papers

Nearly-Linear Time Private Hypothesis Selection with the Optimal Approximation Factor [7.069470347531414]
Estimating the density of a distribution from its samples is a fundamental problem in statistics.<n> Hypothesis selection addresses the setting where, in addition to a sample set, we are given $n$ candidate distributions.<n>We propose a differentially private algorithm in the central model that runs in nearly-linear time with respect to the number of hypotheses.
arXiv Detail & Related papers (2025-06-01T20:46:46Z)
Instance-Optimality for Private KL Distribution Estimation [41.35506763248454]
We study the fundamental problem of estimating an unknown discrete distribution $p$ over $d$ symbols, given $n$ i.i.d. samples from the distribution.<n>We propose algorithms that achieve instance-optimality up to constant factors, with and without a differential privacy constraint.
arXiv Detail & Related papers (2025-05-29T16:27:57Z)
Sublinear Algorithms for Wasserstein and Total Variation Distances: Applications to Fairness and Privacy Auditing [7.81603404636933]
We propose a generic framework to learn the probability and cumulative distribution functions (PDFs and CDFs) of a sub-Weibull.<n>We compute mergeable summaries of distributions from the stream of samples while requiring only sublinear space.<n>Our algorithms significantly improves on the existing methods for distance estimation incurring super-linear time and linear space complexities.
arXiv Detail & Related papers (2025-03-10T18:57:48Z)
Statistical-Computational Trade-offs for Density Estimation [60.81548752871115]
We show that for a broad class of data structures their bounds cannot be significantly improved. This is a novel emphstatistical-computational trade-off for density estimation.
arXiv Detail & Related papers (2024-10-30T15:03:33Z)
Relative-Translation Invariant Wasserstein Distance [82.6068808353647]
We introduce a new family of distances, relative-translation invariant Wasserstein distances ($RW_p$) We show that $RW_p distances are also real distance metrics defined on the quotient set $mathcalP_p(mathbbRn)/sim$ invariant to distribution translations.
arXiv Detail & Related papers (2024-09-04T03:41:44Z)
Robust Distribution Learning with Local and Global Adversarial Corruptions [17.22168727622332]
We develop an efficient finite-sample algorithm with error bounded by $sqrtvarepsilon k + rho + tildeO(dsqrtkn-1/(k lor 2))$ when $P$ has bounded covariance. Our efficient procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator.
arXiv Detail & Related papers (2024-06-10T17:48:36Z)
Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian, and Beyond $1+\alpha$ Moments [10.889739958035536]
We introduce a new definitional framework to analyze the fine-grained optimality of algorithms. We show that median-of-means is neighborhood optimal, up to constant factors. It is open to find a neighborhood-separated estimator without constant factor slackness.
arXiv Detail & Related papers (2023-11-21T18:50:38Z)
Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression [21.758330613138778]
We show that the state-of-the-art density ratio estimators perform poorly on well-separated cases. We present an alternative method that leverages multi-class classification for density ratio estimation.
arXiv Detail & Related papers (2023-05-01T15:10:56Z)
Energy-Based Sliced Wasserstein Distance [47.18652387199418]
A key component of the sliced Wasserstein (SW) distance is the slicing distribution. We propose to design the slicing distribution as an energy-based distribution that is parameter-free. We then derive a novel sliced Wasserstein metric, energy-based sliced Waserstein (EBSW) distance.
arXiv Detail & Related papers (2023-04-26T14:28:45Z)
Simple Binary Hypothesis Testing under Local Differential Privacy and Communication Constraints [8.261182037130407]
We study simple binary hypothesis testing under both local differential privacy (LDP) and communication constraints. We qualify our results as either minimax optimal or instance optimal.
arXiv Detail & Related papers (2023-01-09T18:36:49Z)
Linear Optimal Transport Embedding: Provable Wasserstein classification for certain rigid transformations and perturbations [79.23797234241471]
Discriminating between distributions is an important problem in a number of scientific fields. The Linear Optimal Transportation (LOT) embeds the space of distributions into an $L2$-space. We demonstrate the benefits of LOT on a number of distribution classification problems.
arXiv Detail & Related papers (2020-08-20T19:09:33Z)
Fast and Robust Comparison of Probability Measures in Heterogeneous Spaces [62.35667646858558]
We introduce the Anchor Energy (AE) and Anchor Wasserstein (AW) distances, which are respectively the energy and Wasserstein distances instantiated on such representations. Our main contribution is to propose a sweep line algorithm to compute AE emphexactly in log-quadratic time, where a naive implementation would be cubic. We show that AE and AW perform well in various experimental settings at a fraction of the computational cost of popular GW approximations.
arXiv Detail & Related papers (2020-02-05T03:09:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.