Turing-Universal Learners with Optimal Scaling Laws
- URL: http://arxiv.org/abs/2111.05321v1
- Date: Tue, 9 Nov 2021 18:44:35 GMT
- Title: Turing-Universal Learners with Optimal Scaling Laws
- Authors: Preetum Nakkiran
- Abstract summary: We observe the existence of a "universal learner" algorithm, which achieves the best possible distribution-dependent rate among all learning algorithms.
The construction itself is a simple extension of Levin's universal search (Levin, 1973)
- Score: 2.7485183218472016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For a given distribution, learning algorithm, and performance metric, the
rate of convergence (or data-scaling law) is the asymptotic behavior of the
algorithm's test performance as a function of number of train samples. Many
learning methods in both theory and practice have power-law rates, i.e.
performance scales as $n^{-\alpha}$ for some $\alpha > 0$. Moreover, both
theoreticians and practitioners are concerned with improving the rates of their
learning algorithms under settings of interest. We observe the existence of a
"universal learner", which achieves the best possible distribution-dependent
asymptotic rate among all learning algorithms within a specified runtime (e.g.
$O(n^2)$), while incurring only polylogarithmic slowdown over this runtime.
This algorithm is uniform, and does not depend on the distribution, and yet
achieves best-possible rates for all distributions. The construction itself is
a simple extension of Levin's universal search (Levin, 1973). And much like
universal search, the universal learner is not at all practical, and is
primarily of theoretical and philosophical interest.
Related papers
- Near-Polynomially Competitive Active Logistic Regression [6.600655187282174]
It is well known that active learning can require exponentially fewer label queries compared to passive learning.
We present the first algorithm that is competitive with the optimal algorithm on every input.
Our algorithm is based on efficient sampling and can be extended to learn more general class of functions.
arXiv Detail & Related papers (2025-03-07T23:31:58Z) - Learning to Play Against Unknown Opponents [9.346742321348366]
We show how to efficiently construct an optimal learning algorithm when the learning agent is not constrained to no-regret.
All these results make use of recently developed machinery that converts the analysis of learning algorithms to the class of corresponding geometric objects known as menus.
arXiv Detail & Related papers (2024-12-24T09:05:06Z) - Achieving More with Less: A Tensor-Optimization-Powered Ensemble Method [53.170053108447455]
Ensemble learning is a method that leverages weak learners to produce a strong learner.
We design a smooth and convex objective function that leverages the concept of margin, making the strong learner more discriminative.
We then compare our algorithm with random forests of ten times the size and other classical methods across numerous datasets.
arXiv Detail & Related papers (2024-08-06T03:42:38Z) - Efficient Discrepancy Testing for Learning with Distribution Shift [17.472049019016524]
We provide the first set of provably efficient algorithms for testing localized discrepancy distance.
Results imply a broad set of new, efficient learning algorithms in the recently introduced model of Testable Learning with Distribution Shift.
arXiv Detail & Related papers (2024-06-13T17:51:10Z) - A Competitive Algorithm for Agnostic Active Learning [5.4579367210379335]
Most popular algorithms for active learning express their performance in terms of a parameter called the disagreement coefficient.
We get an algorithm that is competitive with the optimal algorithm for any binary hypothesis class $H$ and distribution $D_X$ over $X$.
It is NP-hard to do better than our algorithm's $O(log |H|)$ overhead in general.
arXiv Detail & Related papers (2023-10-28T19:01:16Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Practical, Provably-Correct Interactive Learning in the Realizable
Setting: The Power of True Believers [12.09273192079783]
We consider interactive learning in the realizable setting and develop a general framework to handle problems ranging from best arm identification to active classification.
We design novel computationally efficient algorithms for the realizable setting that match the minimax lower bound up to logarithmic factors.
arXiv Detail & Related papers (2021-11-09T02:33:36Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications.
We propose two new algorithms for bilevel optimization.
We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - A Theory of Universal Learning [26.51949485387526]
We show that there are only three possible rates of universal learning.
We show that the learning curves of any given concept class decay either at an exponential, or arbitrarily slow rates.
arXiv Detail & Related papers (2020-11-09T15:10:32Z) - Reinforcement Learning with General Value Function Approximation:
Provably Efficient Approach via Bounded Eluder Dimension [124.7752517531109]
We establish a provably efficient reinforcement learning algorithm with general value function approximation.
We show that our algorithm achieves a regret bound of $widetildeO(mathrmpoly(dH)sqrtT)$ where $d$ is a complexity measure.
Our theory generalizes recent progress on RL with linear value function approximation and does not make explicit assumptions on the model of the environment.
arXiv Detail & Related papers (2020-05-21T17:36:09Z) - Complete Dictionary Learning via $\ell_p$-norm Maximization [10.82081111170268]
We investigate a family of $ell_p$-norm ($p>2,p in mathbbN$) approaches for the complete dictionary learning problem.
We show that the global maximizers of these formulations are very close to the true dictionary with high probability, even when Gaussian noise is present.
Experiments will demonstrate that the $ell_p$-based approaches enjoy a higher computational efficiency and better robustness than conventional approaches.
arXiv Detail & Related papers (2020-02-24T02:33:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.