Omnipredictors for Regression and the Approximate Rank of Convex
Functions
- URL: http://arxiv.org/abs/2401.14645v1
- Date: Fri, 26 Jan 2024 04:29:53 GMT
- Title: Omnipredictors for Regression and the Approximate Rank of Convex
Functions
- Authors: Parikshit Gopalan, Princewill Okoroafor, Prasad Raghavendra, Abhishek
Shetty, Mihir Singhal
- Abstract summary: An textitomnipredictor for a class $mathcal L$ of loss functions and a class $mathcal C$ of hypotheses is a predictor whose predictions incur less expected loss than the best hypothesis in $mathcal C$ for every loss in $mathcal L$.
- Score: 7.549720420141907
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Consider the supervised learning setting where the goal is to learn to
predict labels $\mathbf y$ given points $\mathbf x$ from a distribution. An
\textit{omnipredictor} for a class $\mathcal L$ of loss functions and a class
$\mathcal C$ of hypotheses is a predictor whose predictions incur less expected
loss than the best hypothesis in $\mathcal C$ for every loss in $\mathcal L$.
Since the work of [GKR+21] that introduced the notion, there has been a large
body of work in the setting of binary labels where $\mathbf y \in \{0, 1\}$,
but much less is known about the regression setting where $\mathbf y \in [0,1]$
can be continuous. Our main conceptual contribution is the notion of
\textit{sufficient statistics} for loss minimization over a family of loss
functions: these are a set of statistics about a distribution such that knowing
them allows one to take actions that minimize the expected loss for any loss in
the family. The notion of sufficient statistics relates directly to the
approximate rank of the family of loss functions.
Our key technical contribution is a bound of $O(1/\varepsilon^{2/3})$ on the
$\epsilon$-approximate rank of convex, Lipschitz functions on the interval
$[0,1]$, which we show is tight up to a factor of $\mathrm{polylog}
(1/\epsilon)$. This yields improved runtimes for learning omnipredictors for
the class of all convex, Lipschitz loss functions under weak learnability
assumptions about the class $\mathcal C$. We also give efficient omnipredictors
when the loss families have low-degree polynomial approximations, or arise from
generalized linear models (GLMs). This translation from sufficient statistics
to faster omnipredictors is made possible by lifting the technique of loss
outcome indistinguishability introduced by [GKH+23] for Boolean labels to the
regression setting.
Related papers
- Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss [33.18537822803389]
We show that whenever the topologies of $L2$ and $Psi_p$ are comparable on our hypothesis class $mathscrF$, $mathscrF$ is a weakly sub-Gaussian class.
Our result holds whether the problem is realizable or not and we refer to this as a emphnear mixing-free rate, since direct dependence on mixing is relegated to an additive higher order term.
arXiv Detail & Related papers (2024-02-08T18:57:42Z) - Optimal Sketching Bounds for Sparse Linear Regression [116.30196615349226]
We study oblivious sketching for $k$-sparse linear regression under various loss functions such as an $ell_p$ norm, or from a broad class of hinge-like loss functions.
We show that for sparse $ell$vareps regression, there is an oblivious distribution over sketches with $Theta(klog(d/k)/varepsilon2)$ rows, which is tight up to a constant factor.
We also show that sketching $O(mu2 klog(mu n d/varepsilon)/var
arXiv Detail & Related papers (2023-04-05T07:24:19Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces
and ReLU Regression under Gaussian Marginals [43.0867217287089]
We study the task of agnostically learning halfspaces under the Gaussian distribution.
We prove a near-optimal computational hardness result for this task.
arXiv Detail & Related papers (2023-02-13T16:46:23Z) - LegendreTron: Uprising Proper Multiclass Loss Learning [22.567234503869845]
Loss functions serve as the foundation of supervised learning and are often chosen prior to model development.
Recent works have sought to emphlearn losses and models jointly.
We present sc LegendreTron as a novel and practical method that jointly learns emphproper canonical losses and probabilities for multiclass problems.
arXiv Detail & Related papers (2023-01-27T13:10:45Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Omnipredictors [19.735769148626588]
Loss minimization is a dominant paradigm in machine learning.
We introduce the notion of an ($mathcalL,mathcalC$)-omnipredictor, which could be used to optimize any loss in a family.
We show that such "loss-oblivious'' learning is feasible through a connection to multicalibration.
arXiv Detail & Related papers (2021-09-11T23:28:49Z) - Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals.
Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.