Margin Maximization as Lossless Maximal Compression
- URL: http://arxiv.org/abs/2001.10318v1
- Date: Tue, 28 Jan 2020 13:40:22 GMT
- Title: Margin Maximization as Lossless Maximal Compression
- Authors: Nikolaos Nikolaou, Henry Reeve, Gavin Brown
- Abstract summary: In classification, functional margin correctly classifying many training examples as possible with maximal confidence has been known to construct models with good generalization guarantees.
This work gives an information-theoretic interpretation of a margin maximizing on a noiseless dataset as one that achieves maximal compression.
- Score: 0.3007949058551534
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ultimate goal of a supervised learning algorithm is to produce models
constructed on the training data that can generalize well to new examples. In
classification, functional margin maximization -- correctly classifying as many
training examples as possible with maximal confidence --has been known to
construct models with good generalization guarantees. This work gives an
information-theoretic interpretation of a margin maximizing model on a
noiseless training dataset as one that achieves lossless maximal compression of
said dataset -- i.e. extracts from the features all the useful information for
predicting the label and no more. The connection offers new insights on
generalization in supervised machine learning, showing margin maximization as a
special case (that of classification) of a more general principle and explains
the success and potential limitations of popular learning algorithms like
gradient boosting. We support our observations with theoretical arguments and
empirical evidence and identify interesting directions for future work.
Related papers
- Efficient and Generalizable Certified Unlearning: A Hessian-free Recollection Approach [8.875278412741695]
Machine unlearning strives to uphold the data owners' right to be forgotten by enabling models to selectively forget specific data.
We develop an algorithm that achieves near-instantaneous unlearning as it only requires a vector addition operation.
arXiv Detail & Related papers (2024-04-02T07:54:18Z) - Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric
Learning [1.4293924404819704]
We shed new light on the traditional nearest neighbors algorithm from the perspective of information theory.
We propose a robust and interpretable framework for tasks such as classification, regression, density estimation, and anomaly detection using a single model.
Our work showcases the architecture's versatility by achieving state-of-the-art results in classification and anomaly detection.
arXiv Detail & Related papers (2023-11-17T00:35:38Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning [83.41487567765871]
Skipper is a model-based reinforcement learning framework.
It automatically generalizes the task given into smaller, more manageable subtasks.
It enables sparse decision-making and focused abstractions on the relevant parts of the environment.
arXiv Detail & Related papers (2023-09-30T02:25:18Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - A Generalized Weighted Optimization Method for Computational Learning
and Inversion [15.535124460414588]
We analyze a generalized weighted least-squares optimization method for computational learning and inversion with noisy data.
We characterize the impact of the weighting scheme on the generalization error of the learning method.
We demonstrate that appropriate weighting from prior knowledge can improve the generalization capability of the learned model.
arXiv Detail & Related papers (2022-01-23T10:35:34Z) - Learning to Generate Novel Classes for Deep Metric Learning [24.048915378172012]
We introduce a new data augmentation approach that synthesizes novel classes and their embedding vectors.
We implement this idea by learning and exploiting a conditional generative model, which, given a class label and a noise, produces a random embedding vector of the class.
Our proposed generator allows the loss to use richer class relations by augmenting realistic and diverse classes, resulting in better generalization to unseen samples.
arXiv Detail & Related papers (2022-01-04T06:55:19Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - Fairness in Streaming Submodular Maximization: Algorithms and Hardness [20.003009252240222]
We develop the first streaming approximation for submodular under fairness constraints, for both monotone and non-monotone functions.
We validate our findings on DPP-based movie recommendation, clustering-based summarization, and maximum coverage in social networks.
arXiv Detail & Related papers (2020-10-14T22:57:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.