Margin-Based Transfer Bounds for Meta Learning with Deep Feature
Embedding
- URL: http://arxiv.org/abs/2012.01602v1
- Date: Wed, 2 Dec 2020 23:50:51 GMT
- Title: Margin-Based Transfer Bounds for Meta Learning with Deep Feature
Embedding
- Authors: Jiechao Guan, Zhiwu Lu, Tao Xiang, Timothy Hospedales
- Abstract summary: We leverage margin theory and statistical learning theory to establish three margin-based transfer bounds for meta-learning based multiclass classification (MLMC)
These bounds reveal that the expected error of a given classification algorithm for a future task can be estimated with the average empirical error on a finite number of previous tasks.
Experiments on three benchmarks show that these margin-based models still achieve competitive performance.
- Score: 67.09827634481712
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: By transferring knowledge learned from seen/previous tasks, meta learning
aims to generalize well to unseen/future tasks. Existing meta-learning
approaches have shown promising empirical performance on various multiclass
classification problems, but few provide theoretical analysis on the
classifiers' generalization ability on future tasks. In this paper, under the
assumption that all classification tasks are sampled from the same
meta-distribution, we leverage margin theory and statistical learning theory to
establish three margin-based transfer bounds for meta-learning based multiclass
classification (MLMC). These bounds reveal that the expected error of a given
classification algorithm for a future task can be estimated with the average
empirical error on a finite number of previous tasks, uniformly over a class of
preprocessing feature maps/deep neural networks (i.e. deep feature embeddings).
To validate these bounds, instead of the commonly-used cross-entropy loss, a
multi-margin loss is employed to train a number of representative MLMC models.
Experiments on three benchmarks show that these margin-based models still
achieve competitive performance, validating the practical value of our
margin-based theoretical analysis.
Related papers
- Understanding Transfer Learning and Gradient-Based Meta-Learning
Techniques [5.2997197698288945]
We investigate performance differences between fine, MAML, and another meta-learning technique called Reptile.
Our findings show that both the output layer and the noisy training conditions induced by data scarcity play important roles in facilitating this specialization for MAML.
We show that the pre-trained features as obtained by the finetuning baseline are more diverse and discriminative than those learned by MAML and Reptile.
arXiv Detail & Related papers (2023-10-09T20:51:49Z) - On Interpretable Approaches to Cluster, Classify and Represent
Multi-Subspace Data via Minimum Lossy Coding Length based on Rate-Distortion
Theory [0.0]
Clustering, classify and represent are three fundamental objectives of learning from high-dimensional data with intrinsic structure.
This paper introduces three interpretable approaches, i.e., segmentation (clustering) via the Minimum Lossy Coding Length criterion, classification via the Minimum Incremental Coding Length criterion and representation via the Maximal Coding Rate Reduction criterion.
arXiv Detail & Related papers (2023-02-21T01:15:08Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Provable Generalization of Overparameterized Meta-learning Trained with
SGD [62.892930625034374]
We study the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML)
We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds.
Our theoretical findings are further validated by experiments.
arXiv Detail & Related papers (2022-06-18T07:22:57Z) - Benign Overfitting in Multiclass Classification: All Roads Lead to
Interpolation [39.02017410837255]
We study benign overfitting in multiclass linear classification.
We consider the following training algorithms on separable data.
We derive novel bounds on the accuracy of the MNI classifier.
arXiv Detail & Related papers (2021-06-21T05:34:36Z) - Revisiting Unsupervised Meta-Learning: Amplifying or Compensating for
the Characteristics of Few-Shot Tasks [30.893785366366078]
We develop a practical approach towards few-shot image classification, where a visual recognition system is constructed with limited data.
We find that the base class set labels are not necessary, and discriminative embeddings could be meta-learned in an unsupervised manner.
Experiments on few-shot learning benchmarks verify our approaches outperform previous methods by a 4-10% performance gap.
arXiv Detail & Related papers (2020-11-30T10:08:35Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Revisiting LSTM Networks for Semi-Supervised Text Classification via
Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results.
We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z) - Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition
from a Domain Adaptation Perspective [98.70226503904402]
Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions.
We propose to augment the classic class-balanced learning by explicitly estimating the differences between the class-conditioned distributions with a meta-learning approach.
arXiv Detail & Related papers (2020-03-24T11:28:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.