Online Realizable Regression and Applications for ReLU Networks
- URL: http://arxiv.org/abs/2602.19172v1
- Date: Sun, 22 Feb 2026 13:02:25 GMT
- Title: Online Realizable Regression and Applications for ReLU Networks
- Authors: Ilan Doron-Arad, Idan Mehalel, Elchanan Mossel,
- Abstract summary: We study realizable online regression in the adversarial model under losses that satisfy an approximate triangle inequality (approximate pseudo-metrics)<n>Our main contribution is a generic potential method that upper bounds $mathbbD_mathrmonl$ by a concrete Dudley-type entropy integral.
- Score: 14.292982828097465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Realizable online regression can behave very differently from online classification. Even without any margin or stochastic assumptions, realizability may enforce horizon-free (finite) cumulative loss under metric-like losses, even when the analogous classification problem has an infinite mistake bound. We study realizable online regression in the adversarial model under losses that satisfy an approximate triangle inequality (approximate pseudo-metrics). Recent work of Attias et al. shows that the minimax realizable cumulative loss is characterized by the scaled Littlestone/online dimension $\mathbb{D}_{\mathrm{onl}}$, but this quantity can be difficult to analyze. Our main contribution is a generic potential method that upper bounds $\mathbb{D}_{\mathrm{onl}}$ by a concrete Dudley-type entropy integral that depends only on covering numbers of the hypothesis class under the induced sup pseudo-metric. We define an \emph{entropy potential} $Φ(\mathcal{H})=\int_{0}^{diam(\mathcal{H})} \log N(\mathcal{H},\varepsilon)\,d\varepsilon$, where $N(\mathcal{H},\varepsilon)$ is the $\varepsilon$-covering number of $\mathcal{H}$, and show that for every $c$-approximate pseudo-metric loss, $\mathbb{D}_{\mathrm{onl}}(\mathcal{H})\le O(c)\,Φ(\mathcal{H})$. In particular, polynomial metric entropy implies $Φ(\mathcal{H})<\infty$ and hence a horizon-free realizable cumulative-loss bound with transparent dependence on effective dimension. We illustrate the method on two families. We prove a sharp $q$-vs.-$d$ dichotomy for realizable online learning (finite and efficiently achievable $Θ_{d,q}(L^d)$ total loss for $L$-Lipschitz regression iff $q>d$, otherwise infinite), and for bounded-norm $k$-ReLU networks separate regression (finite loss, even $\widetilde O(k^2)$, and $O(1)$ for one ReLU) from classification (impossible already for $k=2,d=1$).
Related papers
- Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination [65.37519531362157]
We show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least $tildeOmega(d1/2/alpha2)$.
arXiv Detail & Related papers (2025-10-12T15:42:44Z) - Learning and Computation of $Φ$-Equilibria at the Frontier of Tractability [85.07238533644636]
$Phi$-equilibria is a powerful and flexible framework at the heart of online learning and game theory.<n>We show that an efficient online algorithm incurs average $Phi$-regret at most $epsilon$ using $textpoly(d, k)/epsilon2$ rounds.<n>We also show nearly matching lower bounds in the online setting, thereby obtaining for the first time a family of deviations that captures the learnability of $Phi$-regret.
arXiv Detail & Related papers (2025-02-25T19:08:26Z) - Near-Optimal Algorithms for Omniprediction [6.874077229518565]
We give near-optimal learning algorithms for omniprediction, in both the online and offline settings.<n>Online learning algorithm achieves near-optimal complexity across a number of measures.<n> offline learning algorithm returns an efficient $(mathcalL_mathrmBV,mathcalH,varepsilon(m)$)
arXiv Detail & Related papers (2025-01-28T02:58:37Z) - Iterative thresholding for non-linear learning in the strong $\varepsilon$-contamination model [3.309767076331365]
We derive approximation bounds for learning single neuron models using thresholded descent.
We also study the linear regression problem, where $sigma(mathbfx) = mathbfx$.
arXiv Detail & Related papers (2024-09-05T16:59:56Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Optimal Sketching Bounds for Sparse Linear Regression [116.30196615349226]
We study oblivious sketching for $k$-sparse linear regression under various loss functions such as an $ell_p$ norm, or from a broad class of hinge-like loss functions.
We show that for sparse $ell$vareps regression, there is an oblivious distribution over sketches with $Theta(klog(d/k)/varepsilon2)$ rows, which is tight up to a constant factor.
We also show that sketching $O(mu2 klog(mu n d/varepsilon)/var
arXiv Detail & Related papers (2023-04-05T07:24:19Z) - Low-degree learning and the metric entropy of polynomials [44.99833362998488]
We prove that any (deterministic or randomized) algorithm which learns $mathscrF_nd$ with $L$-accuracy $varepsilon$ requires at least $Omega(sqrtvarepsilon)2dlog n leq log mathsfM(mathscrF_n,d,|cdot|_L,varepsilon) satisfies the two-sided estimate $$c (1-varepsilon)2dlog
arXiv Detail & Related papers (2022-03-17T23:52:08Z) - The Complexity of Dynamic Least-Squares Regression [11.815510373329337]
complexity of dynamic least-squares regression.
Goal is to maintain an $epsilon-approximate solution to $min_mathbfx(t)| mathbfA(t) mathbfb(t) |$ for all $tin.
arXiv Detail & Related papers (2022-01-01T18:36:17Z) - AI without networks [0.0]
We develop a network-free framework for AI incorporating generative modeling.
We demonstrate this framework with examples from three different disciplines - ethology, control theory, and mathematics.
We also propose an easily computed method of credit assignment based on this framework, to help address ethical-legal challenges raised by generative AI.
arXiv Detail & Related papers (2021-06-07T05:50:02Z) - Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals.
Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.