Compression, Generalization and Learning
- URL: http://arxiv.org/abs/2301.12767v2
- Date: Mon, 8 Jan 2024 11:20:43 GMT
- Title: Compression, Generalization and Learning
- Authors: Marco C. Campi and Simone Garatti
- Abstract summary: A compression function is a map that slims down an observational set into a subset of reduced size.
In multiple applications, the condition that one new observation makes the compressed set change is interpreted that this observation brings in extra information.
In this paper, we lay the foundations of a new theory that allows one to keep control on the probability of change of compression.
- Score: 3.045851438458641
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A compression function is a map that slims down an observational set into a
subset of reduced size, while preserving its informational content. In multiple
applications, the condition that one new observation makes the compressed set
change is interpreted that this observation brings in extra information and, in
learning theory, this corresponds to misclassification, or misprediction. In
this paper, we lay the foundations of a new theory that allows one to keep
control on the probability of change of compression (which maps into the
statistical "risk" in learning applications). Under suitable conditions, the
cardinality of the compressed set is shown to be a consistent estimator of the
probability of change of compression (without any upper limit on the size of
the compressed set); moreover, unprecedentedly tight finite-sample bounds to
evaluate the probability of change of compression are obtained under a
generally applicable condition of preference. All results are usable in a fully
agnostic setup, i.e., without requiring any a priori knowledge on the
probability distribution of the observations. Not only these results offer a
valid support to develop trust in observation-driven methodologies, they also
play a fundamental role in learning techniques as a tool for hyper-parameter
tuning.
Related papers
- Problem-dependent convergence bounds for randomized linear gradient compression [4.656302602746228]
In distributed optimization, the communication model updates can be a performance bottleneck.
gradient compression has been proposed as a means of increasing optimization.
We study how the impact of compression on throughput can be in terms of the norm of the Hessian objective.
arXiv Detail & Related papers (2024-11-19T22:26:42Z) - Probabilistic Conformal Prediction with Approximate Conditional Validity [81.30551968980143]
We develop a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution.
Our method consistently outperforms existing approaches in terms of conditional coverage.
arXiv Detail & Related papers (2024-07-01T20:44:48Z) - Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback.
We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate.
The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z) - On the Interaction Between Differential Privacy and Gradient Compression
in Deep Learning [55.22219308265945]
We study how the Gaussian mechanism for differential privacy and gradient compression jointly impact test accuracy in deep learning.
We observe while gradient compression generally has a negative impact on test accuracy in non-private training, it can sometimes improve test accuracy in differentially private training.
arXiv Detail & Related papers (2022-11-01T20:28:45Z) - Approximability and Generalisation [0.0]
We study the role of approximability in learning, both in the full precision and the approximated settings of the predictor.
We show that under mild conditions, approximable target concepts are learnable from a smaller labelled sample.
We give algorithms that guarantee a good predictor whose approximation also enjoys the same generalisation guarantees.
arXiv Detail & Related papers (2022-03-15T15:21:48Z) - CC-Cert: A Probabilistic Approach to Certify General Robustness of
Neural Networks [58.29502185344086]
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks.
It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations.
We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
arXiv Detail & Related papers (2021-09-22T12:46:04Z) - Statistical optimality conditions for compressive ensembles [7.766921168069532]
We present a framework for the theoretical analysis of ensembles of low-complexity empirical risk minimisers trained on independent random compressions of high-dimensional data.
We introduce a general distribution-dependent upper-bound on the excess risk, framed in terms of a natural notion of compressibility.
We then instantiate this general bound to classification and regression tasks, considering Johnson-Lindenstrauss mappings as the compression scheme.
arXiv Detail & Related papers (2021-06-02T11:52:31Z) - Deconfounding Scores: Feature Representations for Causal Effect
Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation.
We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data.
In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z) - Learning, compression, and leakage: Minimising classification error via
meta-universal compression principles [87.054014983402]
A promising group of compression techniques for learning scenarios is normalised maximum likelihood (NML) coding.
Here we consider a NML-based decision strategy for supervised classification problems, and show that it attains PAC learning when applied to a wide variety of models.
We show that the misclassification rate of our method is upper bounded by the maximal leakage, a recently proposed metric to quantify the potential of data leakage in privacy-sensitive scenarios.
arXiv Detail & Related papers (2020-10-14T20:03:58Z) - On Compression Principle and Bayesian Optimization for Neural Networks [0.0]
We propose a compression principle that states that an optimal predictive model is the one that minimizes a total compressed message length of all data and model definition while guarantees decodability.
We show that dropout can be used for a continuous dimensionality reduction that allows to find optimal network dimensions as required by the compression principle.
arXiv Detail & Related papers (2020-06-23T03:23:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.