Sublinear Algorithms for Wasserstein and Total Variation Distances: Applications to Fairness and Privacy Auditing
- URL: http://arxiv.org/abs/2503.07775v1
- Date: Mon, 10 Mar 2025 18:57:48 GMT
- Title: Sublinear Algorithms for Wasserstein and Total Variation Distances: Applications to Fairness and Privacy Auditing
- Authors: Debabrota Basu, Debarshi Chanda,
- Abstract summary: We propose a generic algorithmic framework to estimate the PDF and CDF of any sub-Gaussian distribution while the samples from them arrive in a stream.<n>We compute mergeable summaries of distributions from the stream of samples that require sublinear space w.r.t. the number of observed samples.<n>This allows us to estimate Wasserstein and Total Variation (TV) distances between any two sub-Gaussian while samples arrive in streams and from multiple sources.
- Score: 7.81603404636933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Resource-efficiently computing representations of probability distributions and the distances between them while only having access to the samples is a fundamental and useful problem across mathematical sciences. In this paper, we propose a generic algorithmic framework to estimate the PDF and CDF of any sub-Gaussian distribution while the samples from them arrive in a stream. We compute mergeable summaries of distributions from the stream of samples that require sublinear space w.r.t. the number of observed samples. This allows us to estimate Wasserstein and Total Variation (TV) distances between any two sub-Gaussian distributions while samples arrive in streams and from multiple sources (e.g. federated learning). Our algorithms significantly improves on the existing methods for distance estimation incurring super-linear time and linear space complexities. In addition, we use the proposed estimators of Wasserstein and TV distances to audit the fairness and privacy of the ML algorithms. We empirically demonstrate the efficiency of the algorithms for estimating these distances and auditing using both synthetic and real-world datasets.
Related papers
- Learning local neighborhoods of non-Gaussian graphical models: A measure transport approach [0.3749861135832072]
We propose a scalable algorithm to infer the conditional independence relationships of each variable by exploiting the local Markov property.
The proposed method, named Localized Sparsity Identification for Non-Gaussian Distributions (L-SING), estimates the graph by using flexible classes of transport maps.
arXiv Detail & Related papers (2025-03-18T04:53:22Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Distributed Bayesian Estimation in Sensor Networks: Consensus on
Marginal Densities [15.038649101409804]
We derive a distributed provably-correct algorithm in the functional space of probability distributions over continuous variables.
We leverage these results to obtain new distributed estimators restricted to subsets of variables observed by individual agents.
This relates to applications such as cooperative localization and federated learning, where the data collected at any agent depends on a subset of all variables of interest.
arXiv Detail & Related papers (2023-12-02T21:10:06Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Unsupervised Learning of Sampling Distributions for Particle Filters [80.6716888175925]
We put forward four methods for learning sampling distributions from observed measurements.
Experiments demonstrate that learned sampling distributions exhibit better performance than designed, minimum-degeneracy sampling distributions.
arXiv Detail & Related papers (2023-02-02T15:50:21Z) - Super-resolution GANs of randomly-seeded fields [68.8204255655161]
We propose a novel super-resolution generative adversarial network (GAN) framework to estimate field quantities from random sparse sensors.
The algorithm exploits random sampling to provide incomplete views of the high-resolution underlying distributions.
The proposed technique is tested on synthetic databases of fluid flow simulations, ocean surface temperature distributions measurements, and particle image velocimetry data.
arXiv Detail & Related papers (2022-02-23T18:57:53Z) - Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems.
We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z) - Efficient Discretizations of Optimal Transport [16.996068297291057]
We propose an algorithm for calculating discretizations with a given number of points for marginal distributions.
We prove bounds for our approximation and demonstrate performance on a wide range of problems.
arXiv Detail & Related papers (2021-02-16T04:31:52Z) - Learning High Dimensional Wasserstein Geodesics [55.086626708837635]
We propose a new formulation and learning strategy for computing the Wasserstein geodesic between two probability distributions in high dimensions.
By applying the method of Lagrange multipliers to the dynamic formulation of the optimal transport (OT) problem, we derive a minimax problem whose saddle point is the Wasserstein geodesic.
We then parametrize the functions by deep neural networks and design a sample based bidirectional learning algorithm for training.
arXiv Detail & Related papers (2021-02-05T04:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.