Learning from Matured Dumb Teacher for Fine Generalization
- URL: http://arxiv.org/abs/2108.05776v1
- Date: Thu, 12 Aug 2021 14:37:36 GMT
- Title: Learning from Matured Dumb Teacher for Fine Generalization
- Authors: HeeSeung Jung, Kangil Kim, Hoyong Kim and Jong-Hun Shin
- Abstract summary: We show that random, untrained, and equally structured teacher networks can vastly improve generalization performance.
We propose matured dumb teacher based KD, conservatively transferring the hypothesis for generalization of the student without massive destruction of trained information.
- Score: 0.6079137591620588
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The flexibility of decision boundaries in neural networks that are unguided
by training data is a well-known problem typically resolved with generalization
methods. A surprising result from recent knowledge distillation (KD) literature
is that random, untrained, and equally structured teacher networks can also
vastly improve generalization performance. It raises the possibility of
existence of undiscovered assumptions useful for generalization on an uncertain
region. In this paper, we shed light on the assumptions by analyzing decision
boundaries and confidence distributions of both simple and KD-based
generalization methods. Assuming that a decision boundary exists to represent
the most general tendency of distinction on an input sample space (i.e., the
simplest hypothesis), we show the various limitations of methods when using the
hypothesis. To resolve these limitations, we propose matured dumb teacher based
KD, conservatively transferring the hypothesis for generalization of the
student without massive destruction of trained information. In practical
experiments on feed-forward and convolution neural networks for image
classification tasks on MNIST, CIFAR-10, and CIFAR-100 datasets, the proposed
method shows stable improvement to the best test performance in the grid search
of hyperparameters. The analysis and results imply that the proposed method can
provide finer generalization than existing methods.
Related papers
- Uncertainty Quantification via Hölder Divergence for Multi-View Representation Learning [18.419742575630217]
This paper introduces a novel algorithm based on H"older Divergence (HD) to enhance the reliability of multi-view learning.
Through the Dempster-Shafer theory, integration of uncertainty from different modalities, thereby generating a comprehensive result.
Mathematically, HD proves to better measure the distance'' between real data distribution and predictive distribution of the model.
arXiv Detail & Related papers (2024-10-29T04:29:44Z) - Multi-dimensional domain generalization with low-rank structures [18.565189720128856]
In statistical and machine learning methods, it is typically assumed that the test data are identically distributed with the training data.
This assumption does not always hold, especially in applications where the target population are not well-represented in the training data.
We present a novel approach to addressing this challenge in linear regression models.
arXiv Detail & Related papers (2023-09-18T08:07:58Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - Modeling Uncertain Feature Representation for Domain Generalization [49.129544670700525]
We show that our method consistently improves the network generalization ability on multiple vision tasks.
Our methods are simple yet effective and can be readily integrated into networks without additional trainable parameters or loss constraints.
arXiv Detail & Related papers (2023-01-16T14:25:02Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - Principled Knowledge Extrapolation with GANs [92.62635018136476]
We study counterfactual synthesis from a new perspective of knowledge extrapolation.
We show that an adversarial game with a closed-form discriminator can be used to address the knowledge extrapolation problem.
Our method enjoys both elegant theoretical guarantees and superior performance in many scenarios.
arXiv Detail & Related papers (2022-05-21T08:39:42Z) - Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective.
In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems.
We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z) - Selective Classification via One-Sided Prediction [54.05407231648068]
One-sided prediction (OSP) based relaxation yields an SC scheme that attains near-optimal coverage in the practically relevant high target accuracy regime.
We theoretically derive bounds generalization for SC and OSP, and empirically we show that our scheme strongly outperforms state of the art methods in coverage at small error levels.
arXiv Detail & Related papers (2020-10-15T16:14:27Z) - Density Fixing: Simple yet Effective Regularization Method based on the
Class Prior [2.3859169601259347]
We propose a framework of regularization methods, called density-fixing, that can be used commonly for supervised and semi-supervised learning.
Our proposed regularization method improves the generalization performance by forcing the model to approximate the class's prior distribution or the frequency of occurrence.
arXiv Detail & Related papers (2020-07-08T04:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.