Revisiting the Critical Factors of Augmentation-Invariant Representation
Learning
- URL: http://arxiv.org/abs/2208.00275v1
- Date: Sat, 30 Jul 2022 17:07:13 GMT
- Title: Revisiting the Critical Factors of Augmentation-Invariant Representation
Learning
- Authors: Junqiang Huang, Xiangwen Kong, Xiangyu Zhang
- Abstract summary: We revisit MoCo v2 and BYOL and try to prove the authenticity of the following assumption.
We establish the first benchmark for fair comparisons between MoCo v2 and BYOL.
- Score: 8.28445083127418
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We focus on better understanding the critical factors of
augmentation-invariant representation learning. We revisit MoCo v2 and BYOL and
try to prove the authenticity of the following assumption: different frameworks
bring about representations of different characteristics even with the same
pretext task. We establish the first benchmark for fair comparisons between
MoCo v2 and BYOL, and observe: (i) sophisticated model configurations enable
better adaptation to pre-training dataset; (ii) mismatched optimization
strategies of pre-training and fine-tuning hinder model from achieving
competitive transfer performances. Given the fair benchmark, we make further
investigation and find asymmetry of network structure endows contrastive
frameworks to work well under the linear evaluation protocol, while may hurt
the transfer performances on long-tailed classification tasks. Moreover,
negative samples do not make models more sensible to the choice of data
augmentations, nor does the asymmetric network structure. We believe our
findings provide useful information for future work.
Related papers
- Improve Vision Language Model Chain-of-thought Reasoning [86.83335752119741]
Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness.
We show that training VLM on short answers does not generalize well to reasoning tasks that require more detailed responses.
arXiv Detail & Related papers (2024-10-21T17:00:06Z) - Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Zero-Shot Embeddings Inform Learning and Forgetting with Vision-Language Encoders [6.7181844004432385]
The Inter-Intra Modal Measure (IIMM) functions as a strong predictor of performance changes with fine-tuning.
Fine-tuning on tasks with higher IIMM scores produces greater in-domain performance gains but also induces more severe out-of-domain performance degradation.
With only a single forward pass of the target data, practitioners can leverage this key insight to evaluate the degree to which a model can be expected to improve following fine-tuning.
arXiv Detail & Related papers (2024-07-22T15:35:09Z) - Local Consensus Enhanced Siamese Network with Reciprocal Loss for
Two-view Correspondence Learning [35.5851523517487]
Two-view correspondence learning usually establish an end-to-end network to jointly predict correspondence reliability and relative pose.
We propose a Local Feature Consensus (LFC) plugin block to augment the features of existing models.
We extend existing models to a Siamese network with a reciprocal loss that exploits the supervision of mutual projection.
arXiv Detail & Related papers (2023-08-06T22:20:09Z) - On the Trade-off of Intra-/Inter-class Diversity for Supervised
Pre-training [72.8087629914444]
We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset.
With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
arXiv Detail & Related papers (2023-05-20T16:23:50Z) - Weak Augmentation Guided Relational Self-Supervised Learning [80.0680103295137]
We introduce a novel relational self-supervised learning (ReSSL) framework that learns representations by modeling the relationship between different instances.
Our proposed method employs sharpened distribution of pairwise similarities among different instances as textitrelation metric.
Experimental results show that our proposed ReSSL substantially outperforms the state-of-the-art methods across different network architectures.
arXiv Detail & Related papers (2022-03-16T16:14:19Z) - How Well Do Sparse Imagenet Models Transfer? [75.98123173154605]
Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" datasets.
In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset.
We show that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities.
arXiv Detail & Related papers (2021-11-26T11:58:51Z) - Mean Embeddings with Test-Time Data Augmentation for Ensembling of
Representations [8.336315962271396]
We look at the ensembling of representations and propose mean embeddings with test-time augmentation (MeTTA)
MeTTA significantly boosts the quality of linear evaluation on ImageNet for both supervised and self-supervised models.
We believe that spreading the success of ensembles to inference higher-quality representations is the important step that will open many new applications of ensembling.
arXiv Detail & Related papers (2021-06-15T10:49:46Z) - Supervised Contrastive Learning for Pre-trained Language Model
Fine-tuning [23.00300794016583]
State-of-the-art natural language understanding classification models follow two-stages.
We propose a supervised contrastive learning (SCL) objective for the fine-tuning stage.
Our proposed fine-tuning objective leads to models that are more robust to different levels of noise in the fine-tuning training data.
arXiv Detail & Related papers (2020-11-03T01:10:39Z) - On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts.
We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time.
We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.