Approximability and Generalisation
- URL: http://arxiv.org/abs/2203.07989v1
- Date: Tue, 15 Mar 2022 15:21:48 GMT
- Title: Approximability and Generalisation
- Authors: Andrew J. Turner and Ata Kab\'an
- Abstract summary: We study the role of approximability in learning, both in the full precision and the approximated settings of the predictor.
We show that under mild conditions, approximable target concepts are learnable from a smaller labelled sample.
We give algorithms that guarantee a good predictor whose approximation also enjoys the same generalisation guarantees.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Approximate learning machines have become popular in the era of small
devices, including quantised, factorised, hashed, or otherwise compressed
predictors, and the quest to explain and guarantee good generalisation
abilities for such methods has just begun. In this paper we study the role of
approximability in learning, both in the full precision and the approximated
settings of the predictor that is learned from the data, through a notion of
sensitivity of predictors to the action of the approximation operator at hand.
We prove upper bounds on the generalisation of such predictors, yielding the
following main findings, for any PAC-learnable class and any given
approximation operator. 1) We show that under mild conditions, approximable
target concepts are learnable from a smaller labelled sample, provided
sufficient unlabelled data. 2) We give algorithms that guarantee a good
predictor whose approximation also enjoys the same generalisation guarantees.
3) We highlight natural examples of structure in the class of sensitivities,
which reduce, and possibly even eliminate the otherwise abundant requirement of
additional unlabelled data, and henceforth shed new light onto what makes one
problem instance easier to learn than another. These results embed the scope of
modern model compression approaches into the general goal of statistical
learning theory, which in return suggests appropriate algorithms through
minimising uniform bounds.
Related papers
- Slicing Mutual Information Generalization Bounds for Neural Networks [14.48773730230054]
We introduce new, tighter information-theoretic generalization bounds tailored for deep learning algorithms.
Our bounds offer significant computational and statistical advantages over standard MI bounds.
We extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces.
arXiv Detail & Related papers (2024-06-06T13:15:37Z) - Minimum Description Length and Generalization Guarantees for
Representation Learning [16.2444595840653]
This paper presents a framework that allows us to derive upper bounds on the generalization error of a representation learning algorithm.
Rather than the mutual information between the encoder's input and the representation, our new bounds involve the "multi-letter" relative entropy.
To the best knowledge of the authors, the established generalization bounds are the first of their kind for Information Bottleneck (IB) type encoders and representation learning.
arXiv Detail & Related papers (2024-02-05T18:12:28Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - Predictive Coding beyond Correlations [59.47245250412873]
We show how one of such algorithms, called predictive coding, is able to perform causal inference tasks.
First, we show how a simple change in the inference process of predictive coding enables to compute interventions without the need to mutilate or redefine a causal graph.
arXiv Detail & Related papers (2023-06-27T13:57:16Z) - Active Learning in the Predict-then-Optimize Framework: A Margin-Based
Approach [5.371816551086118]
We develop a learning method that sequentially decides whether to request the "labels" of feature samples from an unlabeled data stream.
Our active learning method is the first to be directly informed by the decision error induced by the predicted parameters.
arXiv Detail & Related papers (2023-05-11T05:44:36Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective.
In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems.
We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z) - Parsimonious Inference [0.0]
Parsimonious inference is an information-theoretic formulation of inference over arbitrary architectures.
Our approaches combine efficient encodings with prudent sampling strategies to construct predictive ensembles without cross-validation.
arXiv Detail & Related papers (2021-03-03T04:13:14Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.