Related papers: A Theory of Universal Learning

A Theory of Universal Learning

URL: http://arxiv.org/abs/2011.04483v1
Date: Mon, 9 Nov 2020 15:10:32 GMT
Title: A Theory of Universal Learning
Authors: Olivier Bousquet, Steve Hanneke, Shay Moran, Ramon van Handel, Amir Yehudayoff
Abstract summary: We show that there are only three possible rates of universal learning. We show that the learning curves of any given concept class decay either at an exponential, or arbitrarily slow rates.
Score: 26.51949485387526
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: How quickly can a given class of concepts be learned from examples? It is common to measure the performance of a supervised machine learning algorithm by plotting its "learning curve", that is, the decay of the error rate as a function of the number of training examples. However, the classical theoretical framework for understanding learnability, the PAC model of Vapnik-Chervonenkis and Valiant, does not explain the behavior of learning curves: the distribution-free PAC model of learning can only bound the upper envelope of the learning curves over all possible data distributions. This does not match the practice of machine learning, where the data source is typically fixed in any given scenario, while the learner may choose the number of training examples on the basis of factors such as computational resources and desired accuracy. In this paper, we study an alternative learning model that better captures such practical aspects of machine learning, but still gives rise to a complete theory of the learnable in the spirit of the PAC model. More precisely, we consider the problem of universal learning, which aims to understand the performance of learning algorithms on every data distribution, but without requiring uniformity over the distribution. The main result of this paper is a remarkable trichotomy: there are only three possible rates of universal learning. More precisely, we show that the learning curves of any given concept class decay either at an exponential, linear, or arbitrarily slow rates. Moreover, each of these cases is completely characterized by appropriate combinatorial parameters, and we exhibit optimal learning algorithms that achieve the best possible rate in each case. For concreteness, we consider in this paper only the realizable case, though analogous results are expected to extend to more general learning scenarios.

Related papers

Sample Compression for Continual Learning [4.354838732412981]
Continual learning algorithms aim to learn from a sequence of tasks, making the training distribution non-stationary. We present a new method called 'Continual Pick-to-Learn' (CoP2L), which is able to retain the most representative samples for each task in an efficient way.
arXiv Detail & Related papers (2025-03-13T16:05:56Z)
Universal Rates of Empirical Risk Minimization [15.001378595582269]
We study the problem of universal learning by empirical risk (ERM) in the realizable case. There are only four possible universal learning rates by ERM, namely, the learning curves of any concept class learnable by ERM decay either at $e-n$, $1/n$, $log(n)/n$, or arbitrarily slow rates.
arXiv Detail & Related papers (2024-12-03T20:20:39Z)
Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning. One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems. We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z)
A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time" It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z)
Reciprocal Learning [0.0]
We show that a wide array of machine learning algorithms are specific instances of one single paradigm: reciprocal learning. We introduce reciprocal learning as a generalization of these algorithms using the language of decision theory. We find that reciprocal learning algorithms converge at linear rates to an approximately optimal model under relatively mild assumptions on the loss function.
arXiv Detail & Related papers (2024-08-12T16:14:52Z)
Collaborative Learning with Different Labeling Functions [7.228285747845779]
We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions. We show that, when the data distributions satisfy a weaker realizability assumption, sample-efficient learning is still feasible.
arXiv Detail & Related papers (2024-02-16T04:32:22Z)
Ticketed Learning-Unlearning Schemes [57.89421552780526]
We propose a new ticketed model for learning--unlearning. We provide space-efficient ticketed learning--unlearning schemes for a broad family of concept classes.
arXiv Detail & Related papers (2023-06-27T18:54:40Z)
Multiclass classification utilising an estimated algorithmic probability prior [0.5156484100374058]
We study how algorithmic information theory, especially algorithmic probability, may aid in a machine learning task. This work is one of the first to demonstrate how algorithmic probability can aid in a concrete, real-world, machine learning problem.
arXiv Detail & Related papers (2022-12-14T07:50:12Z)
Less is More: Rethinking Few-Shot Learning and Recurrent Neural Nets [2.824895388993495]
We provide theoretical guarantees for reliable learning under the information-theoretic AEP. We then focus on a highly efficient recurrent neural net (RNN) framework and propose a reduced-entropy algorithm for few-shot learning. Our experimental results demonstrate significant potential for improving learning models' sample efficiency, generalization, and time complexity.
arXiv Detail & Related papers (2022-09-28T17:33:11Z)
Beyond spectral gap: The role of the topology in decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model. This paper aims to paint an accurate picture of sparsely-connected distributed optimization when workers share the same data distribution. Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies.
arXiv Detail & Related papers (2022-06-07T08:19:06Z)
Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks [62.48782506095565]
We show that due to the greedy nature of learning in deep neural networks, models tend to rely on just one modality while under-fitting the other modalities. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning.
arXiv Detail & Related papers (2022-02-10T20:11:21Z)
On the Theory of Reinforcement Learning with Once-per-Episode Feedback [120.5537226120512]
We introduce a theory of reinforcement learning in which the learner receives feedback only once at the end of an episode. This is arguably more representative of real-world applications than the traditional requirement that the learner receive feedback at every time step.
arXiv Detail & Related papers (2021-05-29T19:48:51Z)
The Shape of Learning Curves: a Review [14.764764847928259]
This review recounts the origins of the term, provides a formal definition of the learning curve, and briefly covers basics such as its estimation. We discuss empirical and theoretical evidence that supports well-behaved curves that often have the shape of a power law or an exponential. We draw specific attention to examples of learning curves that are ill-behaved, showing worse learning performance with more training data.
arXiv Detail & Related papers (2021-03-19T17:56:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.