A Theory of Universal Learning
- URL: http://arxiv.org/abs/2011.04483v1
- Date: Mon, 9 Nov 2020 15:10:32 GMT
- Title: A Theory of Universal Learning
- Authors: Olivier Bousquet, Steve Hanneke, Shay Moran, Ramon van Handel, Amir
Yehudayoff
- Abstract summary: We show that there are only three possible rates of universal learning.
We show that the learning curves of any given concept class decay either at an exponential, or arbitrarily slow rates.
- Score: 26.51949485387526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How quickly can a given class of concepts be learned from examples? It is
common to measure the performance of a supervised machine learning algorithm by
plotting its "learning curve", that is, the decay of the error rate as a
function of the number of training examples. However, the classical theoretical
framework for understanding learnability, the PAC model of Vapnik-Chervonenkis
and Valiant, does not explain the behavior of learning curves: the
distribution-free PAC model of learning can only bound the upper envelope of
the learning curves over all possible data distributions. This does not match
the practice of machine learning, where the data source is typically fixed in
any given scenario, while the learner may choose the number of training
examples on the basis of factors such as computational resources and desired
accuracy.
In this paper, we study an alternative learning model that better captures
such practical aspects of machine learning, but still gives rise to a complete
theory of the learnable in the spirit of the PAC model. More precisely, we
consider the problem of universal learning, which aims to understand the
performance of learning algorithms on every data distribution, but without
requiring uniformity over the distribution. The main result of this paper is a
remarkable trichotomy: there are only three possible rates of universal
learning. More precisely, we show that the learning curves of any given concept
class decay either at an exponential, linear, or arbitrarily slow rates.
Moreover, each of these cases is completely characterized by appropriate
combinatorial parameters, and we exhibit optimal learning algorithms that
achieve the best possible rate in each case.
For concreteness, we consider in this paper only the realizable case, though
analogous results are expected to extend to more general learning scenarios.
Related papers
- A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time"
It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z) - Reciprocal Learning [0.0]
We show that a wide array of machine learning algorithms are specific instances of one single paradigm: reciprocal learning.
We introduce reciprocal learning as a generalization of these algorithms using the language of decision theory.
We find that reciprocal learning algorithms converge at linear rates to an approximately optimal model under relatively mild assumptions on the loss function.
arXiv Detail & Related papers (2024-08-12T16:14:52Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Collaborative Learning with Different Labeling Functions [7.228285747845779]
We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions.
We show that, when the data distributions satisfy a weaker realizability assumption, sample-efficient learning is still feasible.
arXiv Detail & Related papers (2024-02-16T04:32:22Z) - Ticketed Learning-Unlearning Schemes [57.89421552780526]
We propose a new ticketed model for learning--unlearning.
We provide space-efficient ticketed learning--unlearning schemes for a broad family of concept classes.
arXiv Detail & Related papers (2023-06-27T18:54:40Z) - Multiclass classification utilising an estimated algorithmic probability
prior [0.5156484100374058]
We study how algorithmic information theory, especially algorithmic probability, may aid in a machine learning task.
This work is one of the first to demonstrate how algorithmic probability can aid in a concrete, real-world, machine learning problem.
arXiv Detail & Related papers (2022-12-14T07:50:12Z) - Less is More: Rethinking Few-Shot Learning and Recurrent Neural Nets [2.824895388993495]
We provide theoretical guarantees for reliable learning under the information-theoretic AEP.
We then focus on a highly efficient recurrent neural net (RNN) framework and propose a reduced-entropy algorithm for few-shot learning.
Our experimental results demonstrate significant potential for improving learning models' sample efficiency, generalization, and time complexity.
arXiv Detail & Related papers (2022-09-28T17:33:11Z) - Beyond spectral gap: The role of the topology in decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model.
This paper aims to paint an accurate picture of sparsely-connected distributed optimization when workers share the same data distribution.
Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies.
arXiv Detail & Related papers (2022-06-07T08:19:06Z) - Characterizing and overcoming the greedy nature of learning in
multi-modal deep neural networks [62.48782506095565]
We show that due to the greedy nature of learning in deep neural networks, models tend to rely on just one modality while under-fitting the other modalities.
We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning.
arXiv Detail & Related papers (2022-02-10T20:11:21Z) - On the Theory of Reinforcement Learning with Once-per-Episode Feedback [120.5537226120512]
We introduce a theory of reinforcement learning in which the learner receives feedback only once at the end of an episode.
This is arguably more representative of real-world applications than the traditional requirement that the learner receive feedback at every time step.
arXiv Detail & Related papers (2021-05-29T19:48:51Z) - The Shape of Learning Curves: a Review [14.764764847928259]
This review recounts the origins of the term, provides a formal definition of the learning curve, and briefly covers basics such as its estimation.
We discuss empirical and theoretical evidence that supports well-behaved curves that often have the shape of a power law or an exponential.
We draw specific attention to examples of learning curves that are ill-behaved, showing worse learning performance with more training data.
arXiv Detail & Related papers (2021-03-19T17:56:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.