Related papers: Turing-Universal Learners with Optimal Scaling Laws

Turing-Universal Learners with Optimal Scaling Laws

URL: http://arxiv.org/abs/2111.05321v1
Date: Tue, 9 Nov 2021 18:44:35 GMT
Title: Turing-Universal Learners with Optimal Scaling Laws
Authors: Preetum Nakkiran
Abstract summary: We observe the existence of a "universal learner" algorithm, which achieves the best possible distribution-dependent rate among all learning algorithms. The construction itself is a simple extension of Levin's universal search (Levin, 1973)
Score: 2.7485183218472016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For a given distribution, learning algorithm, and performance metric, the rate of convergence (or data-scaling law) is the asymptotic behavior of the algorithm's test performance as a function of number of train samples. Many learning methods in both theory and practice have power-law rates, i.e. performance scales as $n^{-\alpha}$ for some $\alpha > 0$. Moreover, both theoreticians and practitioners are concerned with improving the rates of their learning algorithms under settings of interest. We observe the existence of a "universal learner", which achieves the best possible distribution-dependent asymptotic rate among all learning algorithms within a specified runtime (e.g. $O(n^2)$), while incurring only polylogarithmic slowdown over this runtime. This algorithm is uniform, and does not depend on the distribution, and yet achieves best-possible rates for all distributions. The construction itself is a simple extension of Levin's universal search (Levin, 1973). And much like universal search, the universal learner is not at all practical, and is primarily of theoretical and philosophical interest.

Related papers

New Statistical and Computational Results for Learning Junta Distributions [0.38073142980733]
We show that learning $k$-junta distributions is emphcomputationally equivalent to learning $k$-parity functions with noise.<n>We design an algorithm for learning junta distributions whose statistical complexity is optimal.
arXiv Detail & Related papers (2025-05-09T06:44:35Z)
Near-Polynomially Competitive Active Logistic Regression [6.600655187282174]
It is well known that active learning can require exponentially fewer label queries compared to passive learning. We present the first algorithm that is competitive with the optimal algorithm on every input. Our algorithm is based on efficient sampling and can be extended to learn more general class of functions.
arXiv Detail & Related papers (2025-03-07T23:31:58Z)
Learning to Play Against Unknown Opponents [9.346742321348366]
We show how to efficiently construct an optimal learning algorithm when the learning agent is not constrained to no-regret. All these results make use of recently developed machinery that converts the analysis of learning algorithms to the class of corresponding geometric objects known as menus.
arXiv Detail & Related papers (2024-12-24T09:05:06Z)
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models [70.07661254213181]
We propose two algorithms that enjoy provable scaling laws for the test-time compute of large language models.<n>One is a two-stage knockout-style algorithm, where each candidate is evaluated by its average win rate against multiple opponents.<n>The other is a two-stage league-style algorithm, where each candidate is evaluated by its average win rate against multiple opponents.
arXiv Detail & Related papers (2024-11-29T05:29:47Z)
Online Learning for Function Placement in Serverless Computing [6.810196737272974]
We study the placement of virtual functions aimed at minimizing the cost.<n>We propose a novel algorithm, using ideas based on multi-armed bandits.<n>We show through numerical experiments that the proposed algorithm both has good practical performance and modest computational complexity.
arXiv Detail & Related papers (2024-10-17T16:03:43Z)
Achieving More with Less: A Tensor-Optimization-Powered Ensemble Method [53.170053108447455]
Ensemble learning is a method that leverages weak learners to produce a strong learner. We design a smooth and convex objective function that leverages the concept of margin, making the strong learner more discriminative. We then compare our algorithm with random forests of ten times the size and other classical methods across numerous datasets.
arXiv Detail & Related papers (2024-08-06T03:42:38Z)
Efficient Discrepancy Testing for Learning with Distribution Shift [17.472049019016524]
We provide the first set of provably efficient algorithms for testing localized discrepancy distance. Results imply a broad set of new, efficient learning algorithms in the recently introduced model of Testable Learning with Distribution Shift.
arXiv Detail & Related papers (2024-06-13T17:51:10Z)
A Competitive Algorithm for Agnostic Active Learning [5.4579367210379335]
Most popular algorithms for active learning express their performance in terms of a parameter called the disagreement coefficient. We get an algorithm that is competitive with the optimal algorithm for any binary hypothesis class $H$ and distribution $D_X$ over $X$. It is NP-hard to do better than our algorithm's $O(log |H|)$ overhead in general.
arXiv Detail & Related papers (2023-10-28T19:01:16Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Practical, Provably-Correct Interactive Learning in the Realizable Setting: The Power of True Believers [12.09273192079783]
We consider interactive learning in the realizable setting and develop a general framework to handle problems ranging from best arm identification to active classification. We design novel computationally efficient algorithms for the realizable setting that match the minimax lower bound up to logarithmic factors.
arXiv Detail & Related papers (2021-11-09T02:33:36Z)
A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states. We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z)
Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications. We propose two new algorithms for bilevel optimization. We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z)
Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models. The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning. We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)
A Theory of Universal Learning [26.51949485387526]
We show that there are only three possible rates of universal learning. We show that the learning curves of any given concept class decay either at an exponential, or arbitrarily slow rates.
arXiv Detail & Related papers (2020-11-09T15:10:32Z)
Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension [124.7752517531109]
We establish a provably efficient reinforcement learning algorithm with general value function approximation. We show that our algorithm achieves a regret bound of $widetildeO(mathrmpoly(dH)sqrtT)$ where $d$ is a complexity measure. Our theory generalizes recent progress on RL with linear value function approximation and does not make explicit assumptions on the model of the environment.
arXiv Detail & Related papers (2020-05-21T17:36:09Z)
Complete Dictionary Learning via $\ell_p$-norm Maximization [10.82081111170268]
We investigate a family of $ell_p$-norm ($p>2,p in mathbbN$) approaches for the complete dictionary learning problem. We show that the global maximizers of these formulations are very close to the true dictionary with high probability, even when Gaussian noise is present. Experiments will demonstrate that the $ell_p$-based approaches enjoy a higher computational efficiency and better robustness than conventional approaches.
arXiv Detail & Related papers (2020-02-24T02:33:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.