Nested Learning For Multi-Granular Tasks
- URL: http://arxiv.org/abs/2007.06402v1
- Date: Mon, 13 Jul 2020 14:27:14 GMT
- Title: Nested Learning For Multi-Granular Tasks
- Authors: Rapha\"el Achddou, J.Matias di Martino, Guillermo Sapiro
- Abstract summary: generalize poorly to samples that are not from original training distribution.
Standard deep neural networks (DNNs) are commonly trained in an end-to-end fashion for specific tasks.
We introduce the concept of nested learning: how to obtain a hierarchical representation of the input.
We show that nested learning outperforms the same network trained in the standard end-to-end fashion.
- Score: 24.600419295290504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Standard deep neural networks (DNNs) are commonly trained in an end-to-end
fashion for specific tasks such as object recognition, face identification, or
character recognition, among many examples. This specificity often leads to
overconfident models that generalize poorly to samples that are not from the
original training distribution. Moreover, such standard DNNs do not allow to
leverage information from heterogeneously annotated training data, where for
example, labels may be provided with different levels of granularity.
Furthermore, DNNs do not produce results with simultaneous different levels of
confidence for different levels of detail, they are most commonly an all or
nothing approach. To address these challenges, we introduce the concept of
nested learning: how to obtain a hierarchical representation of the input such
that a coarse label can be extracted first, and sequentially refine this
representation, if the sample permits, to obtain successively refined
predictions, all of them with the corresponding confidence. We explicitly
enforce this behavior by creating a sequence of nested information bottlenecks.
Looking at the problem of nested learning from an information theory
perspective, we design a network topology with two important properties. First,
a sequence of low dimensional (nested) feature embeddings are enforced. Then we
show how the explicit combination of nested outputs can improve both the
robustness and the accuracy of finer predictions. Experimental results on
Cifar-10, Cifar-100, MNIST, Fashion-MNIST, Dbpedia, and Plantvillage
demonstrate that nested learning outperforms the same network trained in the
standard end-to-end fashion.
Related papers
- Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Mutual Information Learned Classifiers: an Information-theoretic
Viewpoint of Training Deep Learning Classification Systems [9.660129425150926]
Cross entropy loss can easily lead us to find models which demonstrate severe overfitting behavior.
In this paper, we prove that the existing cross entropy loss minimization for training DNN classifiers essentially learns the conditional entropy of the underlying data distribution.
We propose a mutual information learning framework where we train DNN classifiers via learning the mutual information between the label and input.
arXiv Detail & Related papers (2022-10-03T15:09:19Z) - Learning to Imagine: Diversify Memory for Incremental Learning using
Unlabeled Data [69.30452751012568]
We develop a learnable feature generator to diversify exemplars by adaptively generating diverse counterparts of exemplars.
We introduce semantic contrastive learning to enforce the generated samples to be semantic consistent with exemplars.
Our method does not bring any extra inference cost and outperforms state-of-the-art methods on two benchmarks.
arXiv Detail & Related papers (2022-04-19T15:15:18Z) - Compare learning: bi-attention network for few-shot learning [6.559037166322981]
One of the Few-shot learning methods called metric learning addresses this challenge by first learning a deep distance metric to determine whether a pair of images belong to the same category.
In this paper, we propose a novel approach named Bi-attention network to compare the instances, which can measure the similarity between embeddings of instances precisely, globally and efficiently.
arXiv Detail & Related papers (2022-03-25T07:39:10Z) - Rethinking Nearest Neighbors for Visual Classification [56.00783095670361]
k-NN is a lazy learning method that aggregates the distance between the test image and top-k neighbors in a training set.
We adopt k-NN with pre-trained visual representations produced by either supervised or self-supervised methods in two steps.
Via extensive experiments on a wide range of classification tasks, our study reveals the generality and flexibility of k-NN integration.
arXiv Detail & Related papers (2021-12-15T20:15:01Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - Temporal Calibrated Regularization for Robust Noisy Label Learning [60.90967240168525]
Deep neural networks (DNNs) exhibit great success on many tasks with the help of large-scale well annotated datasets.
However, labeling large-scale data can be very costly and error-prone so that it is difficult to guarantee the annotation quality.
We propose a Temporal Calibrated Regularization (TCR) in which we utilize the original labels and the predictions in the previous epoch together.
arXiv Detail & Related papers (2020-07-01T04:48:49Z) - One Versus all for deep Neural Network Incertitude (OVNNI)
quantification [12.734278426543332]
We propose a new technique to quantify the epistemic uncertainty of data easily.
This method consists in mixing the predictions of an ensemble of DNNs trained to classify One class vs All the other classes (OVA) with predictions from a standard DNN trained to perform All vs All (AVA) classification.
arXiv Detail & Related papers (2020-06-01T14:06:12Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z) - One-vs-Rest Network-based Deep Probability Model for Open Set
Recognition [6.85316573653194]
An intelligent self-learning system should be able to differentiate between known and unknown examples.
One-vs-rest networks can provide more informative hidden representations for unknown examples than the commonly used SoftMax layer.
The proposed probability model outperformed the state-of-the art methods in open set classification scenarios.
arXiv Detail & Related papers (2020-04-17T05:24:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.