HappyMap: A Generalized Multi-calibration Method
- URL: http://arxiv.org/abs/2303.04379v1
- Date: Wed, 8 Mar 2023 05:05:01 GMT
- Title: HappyMap: A Generalized Multi-calibration Method
- Authors: Zhun Deng, Cynthia Dwork, Linjun Zhang
- Abstract summary: Multi-calibration is a powerful and evolving concept originating in the field of algorithmic fairness.
In this work, we view the term $(f(x)-y)$ as just one specific mapping, and explore the power of an enriched class of mappings.
We propose textitHappyMap, a generalization of multi-calibration, which yields a wide range of new applications.
- Score: 23.086009024383024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-calibration is a powerful and evolving concept originating in the field
of algorithmic fairness. For a predictor $f$ that estimates the outcome $y$
given covariates $x$, and for a function class $\mathcal{C}$, multi-calibration
requires that the predictor $f(x)$ and outcome $y$ are indistinguishable under
the class of auditors in $\mathcal{C}$. Fairness is captured by incorporating
demographic subgroups into the class of functions~$\mathcal{C}$. Recent work
has shown that, by enriching the class $\mathcal{C}$ to incorporate appropriate
propensity re-weighting functions, multi-calibration also yields
target-independent learning, wherein a model trained on a source domain
performs well on unseen, future, target domains(approximately) captured by the
re-weightings.
Formally, multi-calibration with respect to $\mathcal{C}$ bounds
$\big|\mathbb{E}_{(x,y)\sim \mathcal{D}}[c(f(x),x)\cdot(f(x)-y)]\big|$ for all
$c \in \mathcal{C}$. In this work, we view the term $(f(x)-y)$ as just one
specific mapping, and explore the power of an enriched class of mappings. We
propose \textit{HappyMap}, a generalization of multi-calibration, which yields
a wide range of new applications, including a new fairness notion for
uncertainty quantification (conformal prediction), a novel technique for
conformal prediction under covariate shift, and a different approach to
analyzing missing data, while also yielding a unified understanding of several
existing seemingly disparate algorithmic fairness notions and
target-independent learning approaches.
We give a single \textit{HappyMap} meta-algorithm that captures all these
results, together with a sufficiency condition for its success.
Related papers
- One-Bit Quantization and Sparsification for Multiclass Linear Classification with Strong Regularization [18.427215139020625]
We show that the best classification is achieved when $f(cdot) = |cdot|2$ and $lambda to infty$.
It is often possible to find sparse and one-bit solutions that perform almost as well as one corresponding to $f(cdot) = |cdot|_infty$ in the large $lambda$ regime.
arXiv Detail & Related papers (2024-02-16T06:39:40Z) - Agnostically Learning Multi-index Models with Queries [54.290489524576756]
We study the power of query access for the task of agnostic learning under the Gaussian distribution.
We show that query access gives significant runtime improvements over random examples for agnostically learning MIMs.
arXiv Detail & Related papers (2023-12-27T15:50:47Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - How Many and Which Training Points Would Need to be Removed to Flip this
Prediction? [34.9118528281516]
We consider the problem of identifying a minimal subset of training data $mathcalS_t$.
If the instances comprising $mathcalS_t$ had been removed prior to training, the categorization of a given test point $x_t$ would have been different.
We propose comparatively fast approximation methods to find $mathcalS_t$ based on influence functions.
arXiv Detail & Related papers (2023-02-04T13:55:12Z) - LegendreTron: Uprising Proper Multiclass Loss Learning [22.567234503869845]
Loss functions serve as the foundation of supervised learning and are often chosen prior to model development.
Recent works have sought to emphlearn losses and models jointly.
We present sc LegendreTron as a novel and practical method that jointly learns emphproper canonical losses and probabilities for multiclass problems.
arXiv Detail & Related papers (2023-01-27T13:10:45Z) - Randomized Block-Coordinate Optimistic Gradient Algorithms for
Root-Finding Problems [8.0153031008486]
We develop two new algorithms to approximate a solution of nonlinear equations in large-scale settings.
We apply our methods to a class of large-scale finite-sum inclusions, which covers prominent applications in machine learning.
arXiv Detail & Related papers (2023-01-08T21:46:27Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Small Covers for Near-Zero Sets of Polynomials and Learning Latent
Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$.
Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.