KL Divergence Estimation with Multi-group Attribution
- URL: http://arxiv.org/abs/2202.13576v1
- Date: Mon, 28 Feb 2022 06:54:10 GMT
- Title: KL Divergence Estimation with Multi-group Attribution
- Authors: Parikshit Gopalan, Nina Narodytska, Omer Reingold, Vatsal Sharan, Udi
Wieder
- Abstract summary: Estimating the Kullback-Leibler (KL) divergence between two distributions is well-studied in machine learning and information theory.
Motivated by considerations of multi-group fairness, we seek KL divergence estimates that accurately reflect the contributions of sub-populations.
- Score: 25.7757954754825
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating the Kullback-Leibler (KL) divergence between two distributions
given samples from them is well-studied in machine learning and information
theory. Motivated by considerations of multi-group fairness, we seek KL
divergence estimates that accurately reflect the contributions of
sub-populations to the overall divergence. We model the sub-populations coming
from a rich (possibly infinite) family $\mathcal{C}$ of overlapping subsets of
the domain. We propose the notion of multi-group attribution for $\mathcal{C}$,
which requires that the estimated divergence conditioned on every
sub-population in $\mathcal{C}$ satisfies some natural accuracy and fairness
desiderata, such as ensuring that sub-populations where the model predicts
significant divergence do diverge significantly in the two distributions. Our
main technical contribution is to show that multi-group attribution can be
derived from the recently introduced notion of multi-calibration for importance
weights [HKRR18, GRSW21]. We provide experimental evidence to support our
theoretical results, and show that multi-group attribution provides better KL
divergence estimates when conditioned on sub-populations than other popular
algorithms.
Related papers
- Generalized Cauchy-Schwarz Divergence and Its Deep Learning Applications [37.349358118385155]
Divergence measures play a central role and become increasingly essential in deep learning.
We introduce a new measure tailored for multiple distributions named the generalized Cauchy-Schwarz divergence (GCSD)
arXiv Detail & Related papers (2024-05-07T07:07:44Z) - How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance [64.1656365676171]
Group imbalance has been a known problem in empirical risk minimization.
This paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance.
arXiv Detail & Related papers (2024-03-12T04:38:05Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Bandit Pareto Set Identification: the Fixed Budget Setting [12.326452468513228]
We study a pure exploration problem in a multi-armed bandit model.
The goal is to identify the distributions whose mean is not uniformly worse than that of another distribution.
arXiv Detail & Related papers (2023-11-07T13:43:18Z) - Understanding Contrastive Learning via Distributionally Robust
Optimization [29.202594242468678]
This study reveals the inherent tolerance of contrastive learning (CL) towards sampling bias, wherein negative samples may encompass similar semantics (eg labels)
We bridge this research gap by analyzing CL through the lens of distributionally robust optimization (DRO), yielding several key insights.
We also identify CL's potential shortcomings, including over-conservatism and sensitivity to outliers, and introduce a novel Adjusted InfoNCE loss (ADNCE) to mitigate these issues.
arXiv Detail & Related papers (2023-10-17T07:32:59Z) - Reweighted Mixup for Subpopulation Shift [63.1315456651771]
Subpopulation shift exists in many real-world applications, which refers to the training and test distributions that contain the same subpopulation groups but with different subpopulation proportions.
Importance reweighting is a classical and effective way to handle the subpopulation shift.
We propose a simple yet practical framework, called reweighted mixup, to mitigate the overfitting issue.
arXiv Detail & Related papers (2023-04-09T03:44:50Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - Unintended Selection: Persistent Qualification Rate Disparities and
Interventions [6.006936459950188]
We study the dynamics of group-level disparities in machine learning.
In particular, we desire models that do not suppose inherent differences between artificial groups of people.
We show that differences in qualification rates between subpopulations can persist indefinitely for a set of non-trivial equilibrium states.
arXiv Detail & Related papers (2021-11-01T18:53:54Z) - Robust Learning of Optimal Auctions [84.13356290199603]
We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed.
We propose new algorithms that can learn a mechanism whose revenue is nearly optimal simultaneously for all true distributions'' that are $alpha$-close to the original distribution in Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2021-07-13T17:37:21Z) - Decision-Making with Auto-Encoding Variational Bayes [71.44735417472043]
We show that a posterior approximation distinct from the variational distribution should be used for making decisions.
Motivated by these theoretical results, we propose learning several approximate proposals for the best model.
In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing.
arXiv Detail & Related papers (2020-02-17T19:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.