What makes instance discrimination good for transfer learning?
- URL: http://arxiv.org/abs/2006.06606v2
- Date: Tue, 19 Jan 2021 15:45:44 GMT
- Title: What makes instance discrimination good for transfer learning?
- Authors: Nanxuan Zhao and Zhirong Wu and Rynson W.H. Lau and Stephen Lin
- Abstract summary: We investigate what makes instance discrimination pretraining good for transfer learning.
What really matters for the transfer is low-level and mid-level representations, not high-level representations.
supervised pretraining can be strengthened by following an exemplar-based approach.
- Score: 82.79808902674282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive visual pretraining based on the instance discrimination pretext
task has made significant progress. Notably, recent work on unsupervised
pretraining has shown to surpass the supervised counterpart for finetuning
downstream applications such as object detection and segmentation. It comes as
a surprise that image annotations would be better left unused for transfer
learning. In this work, we investigate the following problems: What makes
instance discrimination pretraining good for transfer learning? What knowledge
is actually learned and transferred from these models? From this understanding
of instance discrimination, how can we better exploit human annotation labels
for pretraining? Our findings are threefold. First, what truly matters for the
transfer is low-level and mid-level representations, not high-level
representations. Second, the intra-category invariance enforced by the
traditional supervised model weakens transferability by increasing task
misalignment. Finally, supervised pretraining can be strengthened by following
an exemplar-based approach without explicit constraints among the instances
within the same category.
Related papers
- Discriminability-Transferability Trade-Off: An Information-Theoretic
Perspective [17.304811383730417]
This work simultaneously considers the discriminability and transferability properties of deep representations in the typical supervised learning task.
From the perspective of information-bottleneck theory, we reveal that the incompatibility between discriminability and transferability is attributed to the over-compression of input information.
arXiv Detail & Related papers (2022-03-08T06:16:33Z) - Agree to Disagree: Diversity through Disagreement for Better
Transferability [54.308327969778155]
We propose D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data.
We show how D-BAT naturally emerges from the notion of generalized discrepancy.
arXiv Detail & Related papers (2022-02-09T12:03:02Z) - Improving Transferability of Representations via Augmentation-Aware
Self-Supervision [117.15012005163322]
AugSelf is an auxiliary self-supervised loss that learns the difference of augmentation parameters between two randomly augmented samples.
Our intuition is that AugSelf encourages to preserve augmentation-aware information in learned representations, which could be beneficial for their transferability.
AugSelf can easily be incorporated into recent state-of-the-art representation learning methods with a negligible additional training cost.
arXiv Detail & Related papers (2021-11-18T10:43:50Z) - Does Pretraining for Summarization Require Knowledge Transfer? [27.297137706355173]
We show that pretraining on character n-grams selected at random can nearly match the performance of models pretrained on real corpora.
This work holds the promise of eliminating upstream corpora, which may alleviate some concerns over offensive language, bias, and copyright issues.
arXiv Detail & Related papers (2021-09-10T15:54:15Z) - Improve Unsupervised Pretraining for Few-label Transfer [80.58625921631506]
In this paper, we find this conclusion may not hold when the target dataset has very few labeled samples for finetuning.
We propose a new progressive few-label transfer algorithm for real applications.
arXiv Detail & Related papers (2021-07-26T17:59:56Z) - Training GANs with Stronger Augmentations via Contrastive Discriminator [80.8216679195]
We introduce a contrastive representation learning scheme into the GAN discriminator, coined ContraD.
This "fusion" enables the discriminators to work with much stronger augmentations without increasing their training instability.
Our experimental results show that GANs with ContraD consistently improve FID and IS compared to other recent techniques incorporating data augmentations.
arXiv Detail & Related papers (2021-03-17T16:04:54Z) - Are Fewer Labels Possible for Few-shot Learning? [81.89996465197392]
Few-shot learning is challenging due to its very limited data and labels.
Recent studies in big transfer (BiT) show that few-shot learning can greatly benefit from pretraining on large scale labeled dataset in a different domain.
We propose eigen-finetuning to enable fewer shot learning by leveraging the co-evolution of clustering and eigen-samples in the finetuning.
arXiv Detail & Related papers (2020-12-10T18:59:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.