Unraveling Projection Heads in Contrastive Learning: Insights from
Expansion and Shrinkage
- URL: http://arxiv.org/abs/2306.03335v1
- Date: Tue, 6 Jun 2023 01:13:18 GMT
- Title: Unraveling Projection Heads in Contrastive Learning: Insights from
Expansion and Shrinkage
- Authors: Yu Gui, Cong Ma, Yiqiao Zhong
- Abstract summary: We aim to demystify the observed phenomenon where representations learned before projectors outperform those learned after.
We identify two crucial effects -- expansion and shrinkage -- induced by the contrastive loss on the projectors.
We propose a family of linear projectors to accurately model the projector's behavior.
- Score: 9.540723320001621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the role of projection heads, also known as projectors, within
the encoder-projector framework (e.g., SimCLR) used in contrastive learning. We
aim to demystify the observed phenomenon where representations learned before
projectors outperform those learned after -- measured using the downstream
linear classification accuracy, even when the projectors themselves are linear.
In this paper, we make two significant contributions towards this aim.
Firstly, through empirical and theoretical analysis, we identify two crucial
effects -- expansion and shrinkage -- induced by the contrastive loss on the
projectors. In essence, contrastive loss either expands or shrinks the signal
direction in the representations learned by an encoder, depending on factors
such as the augmentation strength, the temperature used in contrastive loss,
etc. Secondly, drawing inspiration from the expansion and shrinkage phenomenon,
we propose a family of linear transformations to accurately model the
projector's behavior. This enables us to precisely characterize the downstream
linear classification accuracy in the high-dimensional asymptotic limit. Our
findings reveal that linear projectors operating in the shrinkage (or
expansion) regime hinder (or improve) the downstream classification accuracy.
This provides the first theoretical explanation as to why (linear) projectors
impact the downstream performance of learned representations. Our theoretical
findings are further corroborated by extensive experiments on both synthetic
data and real image data.
Related papers
- Projection Head is Secretly an Information Bottleneck [33.755883011145755]
We develop an in-depth theoretical understanding of the projection head from the information-theoretic perspective.
By establishing the theoretical guarantees on the downstream performance of the features before the projector, we reveal that an effective projector should act as an information bottleneck.
Our methods exhibit consistent improvement in the downstream performance across various real-world datasets.
arXiv Detail & Related papers (2025-03-01T14:23:31Z) - Improving Nonlinear Projection Heads using Pretrained Autoencoder Embeddings [0.10241134756773229]
Using a pretrained autoencoder embedding in the projector can increase classification accuracy by up to 2.9% or 1.7% on average.
Our results also suggest, that using the sigmoid and tanh activation functions within the projector can outperform ReLU in terms of peak and average classification accuracy.
arXiv Detail & Related papers (2024-08-25T11:10:33Z) - Visual Prompt Tuning in Null Space for Continual Learning [51.96411454304625]
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL)
This paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features.
In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient projection.
arXiv Detail & Related papers (2024-06-09T05:57:40Z) - Investigating the Benefits of Projection Head for Representation Learning [11.20245728716827]
An effective technique for obtaining high-quality representations is adding a projection head on top of the encoder during training, then discarding it and using the pre-projection representations.
The pre-projection representations are not directly optimized by the loss function, raising the question: what makes them better?
We show that implicit bias of training algorithms leads to layer-wise progressive feature weighting, where features become increasingly unequal as we go deeper into the layers.
arXiv Detail & Related papers (2024-03-18T00:48:58Z) - ViT-Calibrator: Decision Stream Calibration for Vision Transformer [49.60474757318486]
We propose a new paradigm dubbed Decision Stream that boosts the performance of general Vision Transformers.
We shed light on the information propagation mechanism in the learning procedure by exploring the correlation between different tokens and the relevance coefficient of multiple dimensions.
arXiv Detail & Related papers (2023-04-10T02:40:24Z) - Understanding the Role of the Projector in Knowledge Distillation [22.698845243751293]
We revisit the efficacy of knowledge distillation as a function matching and metric learning problem.
We verify three important design decisions, namely the normalisation, soft maximum function, and projection layers.
We attain a 77.2% top-1 accuracy with DeiT-Ti on ImageNet.
arXiv Detail & Related papers (2023-03-20T13:33:31Z) - Fundamental Limits of Two-layer Autoencoders, and Achieving Them with
Gradient Methods [91.54785981649228]
This paper focuses on non-linear two-layer autoencoders trained in the challenging proportional regime.
Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods.
For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders.
arXiv Detail & Related papers (2022-12-27T12:37:34Z) - Toward a Geometrical Understanding of Self-supervised Contrastive
Learning [55.83778629498769]
Self-supervised learning (SSL) is one of the premier techniques to create data representations that are actionable for transfer learning in the absence of human annotations.
Mainstream SSL techniques rely on a specific deep neural network architecture with two cascaded neural networks: the encoder and the projector.
In this paper, we investigate how the strength of the data augmentation policies affects the data embedding.
arXiv Detail & Related papers (2022-05-13T23:24:48Z) - Learning High-Precision Bounding Box for Rotated Object Detection via
Kullback-Leibler Divergence [100.6913091147422]
Existing rotated object detectors are mostly inherited from the horizontal detection paradigm.
In this paper, we are motivated to change the design of rotation regression loss from induction paradigm to deduction methodology.
arXiv Detail & Related papers (2021-06-03T14:29:19Z) - Direct phase modulation via optical injection: theoretical study [50.591267188664666]
We study the influence of the spontaneous emission noise, examine the role of the gain non-linearity and consider the effect of the temperature drift.
We have tried to formulate here practical instructions, which will help to take these features into account when elaborating and employing the optical-injection-based phase modulator.
arXiv Detail & Related papers (2020-11-18T13:20:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.