Related papers: Agnostic PAC Learning of k-juntas Using L2-Polynomial Regression

Agnostic PAC Learning of k-juntas Using L2-Polynomial Regression

URL: http://arxiv.org/abs/2303.04859v1
Date: Wed, 8 Mar 2023 19:54:07 GMT
Title: Agnostic PAC Learning of k-juntas Using L2-Polynomial Regression
Authors: Mohsen Heidari, and Wojciech Szpankowski
Abstract summary: We present a new PAC learning algorithm based on the Fourier expansion with lower computational complexity. We prove our results by connecting the PAC learning with 0-1 loss to the minimum mean square estimation problem. We derive an elegant upper bound on the 0-1 loss in terms of the MMSE error and show that the sign of the MMSE is a PAC learner for any concept class containing it.
Score: 9.732863739456036
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many conventional learning algorithms rely on loss functions other than the natural 0-1 loss for computational efficiency and theoretical tractability. Among them are approaches based on absolute loss (L1 regression) and square loss (L2 regression). The first is proved to be an \textit{agnostic} PAC learner for various important concept classes such as \textit{juntas}, and \textit{half-spaces}. On the other hand, the second is preferable because of its computational efficiency, which is linear in the sample size. However, PAC learnability is still unknown as guarantees have been proved only under distributional restrictions. The question of whether L2 regression is an agnostic PAC learner for 0-1 loss has been open since 1993 and yet has to be answered. This paper resolves this problem for the junta class on the Boolean cube -- proving agnostic PAC learning of k-juntas using L2 polynomial regression. Moreover, we present a new PAC learning algorithm based on the Boolean Fourier expansion with lower computational complexity. Fourier-based algorithms, such as Linial et al. (1993), have been used under distributional restrictions, such as uniform distribution. We show that with an appropriate change, one can apply those algorithms in agnostic settings without any distributional assumption. We prove our results by connecting the PAC learning with 0-1 loss to the minimum mean square estimation (MMSE) problem. We derive an elegant upper bound on the 0-1 loss in terms of the MMSE error and show that the sign of the MMSE is a PAC learner for any concept class containing it.

Related papers

Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models [70.07661254213181]
We propose two principled algorithms for the test-time compute of large language models. We prove theoretically that the failure probability of one algorithm decays to zero exponentially as its test-time compute grows.
arXiv Detail & Related papers (2024-11-29T05:29:47Z)
Is Transductive Learning Equivalent to PAC Learning? [0.9012198585960443]
We show that the PAC and transductive models are essentially equivalent for agnostic binary classification. We leave as an intriguing open question whether our second result can be extended beyond binary classification to show the transductive and PAC models equivalent more broadly.
arXiv Detail & Related papers (2024-05-08T16:26:49Z)
Agnostically Learning Multi-index Models with Queries [54.290489524576756]
We study the power of query access for the task of agnostic learning under the Gaussian distribution. We show that query access gives significant runtime improvements over random examples for agnostically learning MIMs.
arXiv Detail & Related papers (2023-12-27T15:50:47Z)
Distributional PAC-Learning from Nisan's Natural Proofs [0.0]
Natural proofs of circuit lower bounds for $Lambda imply efficient algorithms for learning $Lambda-circuits. We consider whether this implication can be generalized to $Lambda notsupseteq AC0[p]$, and to learning algorithms which use only random examples and learn over arbitrary example distributions.
arXiv Detail & Related papers (2023-10-05T16:13:29Z)
Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes. This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z)
On Agnostic PAC Learning using $\mathcal{L}_2$-polynomial Regression and Fourier-based Algorithms [10.66048003460524]
We develop a framework using Hilbert spaces as a proxy to analyze PAC learning problems with structural properties. We demonstrate that PAC learning with 0-1 loss is equivalent to an optimization in the Hilbert space domain.
arXiv Detail & Related papers (2021-02-11T21:28:55Z)
The Optimality of Polynomial Regression for Agnostic Learning under Gaussian Marginals [47.81107898315438]
We develop a method for finding hard families of examples for a wide class of problems by using duality LP. We show that the $L1$-regression is essentially best possible, and therefore that the computational difficulty of learning a concept class is closely related to the degree required to approximate any function from the class in $L1$-norm.
arXiv Detail & Related papers (2021-02-08T18:06:32Z)
Statistical-Query Lower Bounds via Functional Gradients [19.5924910463796]
We show that any statistical-query algorithm with tolerance $n- (1/epsilon)b$ must use at least $2nc epsilon$ queries for some constant $b. Our results rule out general (as opposed to correlational) SQ learning algorithms, which is unusual for real-valued learning problems.
arXiv Detail & Related papers (2020-06-29T05:15:32Z)
Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks [48.32532049640782]
We study the problem of learning one-hidden-layer ReLU networks with $k$ hidden units on $mathbbRd$ under Gaussian marginals. For the case of positive coefficients, we give the first-time algorithm for this learning problem for $k$ up to $tildeOOmega(sqrtlog d)$.
arXiv Detail & Related papers (2020-06-22T17:53:54Z)
Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise. We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z)
Supervised Learning: No Loss No Cry [51.07683542418145]
Supervised learning requires the specification of a loss function to minimise. This paper revisits the sc SLIsotron algorithm of Kakade et al. (2011) through a novel lens. We show how it provides a principled procedure for learning the loss.
arXiv Detail & Related papers (2020-02-10T05:30:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.