Better Supervisory Signals by Observing Learning Paths
- URL: http://arxiv.org/abs/2203.02485v1
- Date: Fri, 4 Mar 2022 18:31:23 GMT
- Title: Better Supervisory Signals by Observing Learning Paths
- Authors: Yi Ren and Shangmin Guo and Danica J. Sutherland
- Abstract summary: We explain two existing label refining methods, label smoothing and knowledge distillation, in terms of our proposed criterion.
We observe the learning path, i.e., the trajectory of the model's predictions during training, for each training sample.
We find that the model can spontaneously refine "bad" labels through a "zig-zag" learning path, which occurs on both toy and real datasets.
- Score: 10.044413937134237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Better-supervised models might have better performance. In this paper, we
first clarify what makes for good supervision for a classification problem, and
then explain two existing label refining methods, label smoothing and knowledge
distillation, in terms of our proposed criterion. To further answer why and how
better supervision emerges, we observe the learning path, i.e., the trajectory
of the model's predictions during training, for each training sample. We find
that the model can spontaneously refine "bad" labels through a "zig-zag"
learning path, which occurs on both toy and real datasets. Observing the
learning path not only provides a new perspective for understanding knowledge
distillation, overfitting, and learning dynamics, but also reveals that the
supervisory signal of a teacher network can be very unstable near the best
points in training on real tasks. Inspired by this, we propose a new knowledge
distillation scheme, Filter-KD, which improves downstream classification
performance in various settings.
Related papers
- One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification.
We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm.
In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z) - Weaker Than You Think: A Critical Look at Weakly Supervised Learning [30.160501243686863]
Weakly supervised learning is a popular approach for training machine learning models in low-resource settings.
We analyze diverse NLP datasets and tasks to ascertain when and why weakly supervised approaches work.
arXiv Detail & Related papers (2023-05-27T10:46:50Z) - MDFlow: Unsupervised Optical Flow Learning by Reliable Mutual Knowledge
Distillation [12.249680550252327]
Current approaches impose an augmentation regularization term for continual self-supervision.
We propose a novel mutual distillation framework to transfer reliable knowledge back and forth between the teacher and student networks.
Our approach, termed MDFlow, achieves state-of-the-art real-time accuracy and generalization ability on challenging benchmarks.
arXiv Detail & Related papers (2022-11-11T05:56:46Z) - Knowledge Distillation Meets Open-Set Semi-Supervised Learning [69.21139647218456]
We propose a novel em modelname (bfem shortname) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student.
At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL)
Our shortname outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks.
arXiv Detail & Related papers (2022-05-13T15:15:27Z) - Co$^2$L: Contrastive Continual Learning [69.46643497220586]
Recent breakthroughs in self-supervised learning show that such algorithms learn visual representations that can be transferred better to unseen tasks.
We propose a rehearsal-based continual learning algorithm that focuses on continually learning and maintaining transferable representations.
arXiv Detail & Related papers (2021-06-28T06:14:38Z) - Learning by Distillation: A Self-Supervised Learning Framework for
Optical Flow Estimation [71.76008290101214]
DistillFlow is a knowledge distillation approach to learning optical flow.
It achieves state-of-the-art unsupervised learning performance on both KITTI and Sintel datasets.
Our models ranked 1st among all monocular methods on the KITTI 2015 benchmark, and outperform all published methods on the Sintel Final benchmark.
arXiv Detail & Related papers (2021-06-08T09:13:34Z) - Distill on the Go: Online knowledge distillation in self-supervised
learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models.
We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation.
Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z) - Unsupervised Class-Incremental Learning Through Confusion [0.4604003661048266]
We introduce a novelty detection method that leverages network confusion caused by training incoming data as a new class.
We found that incorporating a class-imbalance during this detection method substantially enhances performance.
arXiv Detail & Related papers (2021-04-09T15:58:43Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z) - Learning From Multiple Experts: Self-paced Knowledge Distillation for
Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME)
We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model.
We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.