Loss Minimization Yields Multicalibration for Large Neural Networks
- URL: http://arxiv.org/abs/2304.09424v2
- Date: Fri, 8 Dec 2023 04:41:03 GMT
- Title: Loss Minimization Yields Multicalibration for Large Neural Networks
- Authors: Jaros{\l}aw B{\l}asiok, Parikshit Gopalan, Lunjia Hu, Adam Tauman
Kalai, Preetum Nakkiran
- Abstract summary: Multicalibration is a notion of fairness for predictors that requires them to provide calibrated predictions across a large set of protected groups.
We show that minimizing the squared loss over all neural nets of size $n$ implies multicalibration for all but a bounded number of unlucky values of $n$.
- Score: 16.047146428254592
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multicalibration is a notion of fairness for predictors that requires them to
provide calibrated predictions across a large set of protected groups.
Multicalibration is known to be a distinct goal than loss minimization, even
for simple predictors such as linear functions.
In this work, we consider the setting where the protected groups can be
represented by neural networks of size $k$, and the predictors are neural
networks of size $n > k$. We show that minimizing the squared loss over all
neural nets of size $n$ implies multicalibration for all but a bounded number
of unlucky values of $n$. We also give evidence that our bound on the number of
unlucky values is tight, given our proof technique. Previously, results of the
flavor that loss minimization yields multicalibration were known only for
predictors that were near the ground truth, hence were rather limited in
applicability. Unlike these, our results rely on the expressivity of neural
nets and utilize the representation of the predictor.
Related papers
- Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax
Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes [7.433327915285969]
We prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss.
We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence.
arXiv Detail & Related papers (2024-01-08T23:54:46Z) - Semantic Strengthening of Neuro-Symbolic Learning [85.6195120593625]
Neuro-symbolic approaches typically resort to fuzzy approximations of a probabilistic objective.
We show how to compute this efficiently for tractable circuits.
We test our approach on three tasks: predicting a minimum-cost path in Warcraft, predicting a minimum-cost perfect matching, and solving Sudoku puzzles.
arXiv Detail & Related papers (2023-02-28T00:04:22Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work [59.29606307518154]
We show that as long as the width $m geq 2n/d$ (where $d$ is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss.
We also consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer.
arXiv Detail & Related papers (2022-10-21T14:41:26Z) - ESCHER: Eschewing Importance Sampling in Games by Computing a History
Value Function to Estimate Regret [97.73233271730616]
Recent techniques for approximating Nash equilibria in very large games leverage neural networks to learn approximately optimal policies (strategies)
DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR)
We show that a deep learning version of ESCHER outperforms the prior state of the art -- DREAM and neural fictitious self play (NFSP) -- and the difference becomes dramatic as game size increases.
arXiv Detail & Related papers (2022-06-08T18:43:45Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - The Rate of Convergence of Variation-Constrained Deep Neural Networks [35.393855471751756]
We show that a class of variation-constrained neural networks can achieve near-parametric rate $n-1/2+delta$ for an arbitrarily small constant $delta$.
The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived.
arXiv Detail & Related papers (2021-06-22T21:28:00Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Generalized Quantile Loss for Deep Neural Networks [0.8594140167290096]
This note presents a simple way to add a count (or quantile) constraint to a regression neural net, such that given $n$ samples in the training set it guarantees that the prediction of $mn$ samples will be larger than the actual value (the label)
Unlike standard quantile regression networks, the presented method can be applied to any loss function and not necessarily to the standard quantile regression loss, which minimizes the mean absolute differences.
arXiv Detail & Related papers (2020-12-28T16:37:02Z) - Flatness is a False Friend [0.7614628596146599]
Hessian based measures of flatness have been argued, used and shown to relate to generalisation.
We show that for feed forward neural networks under the cross entropy loss, we would expect low loss solutions with large weights to have small Hessian based measures of flatness.
arXiv Detail & Related papers (2020-06-16T11:55:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.