Related papers: Exploring Simple Siamese Representation Learning

Exploring Simple Siamese Representation Learning

URL: http://arxiv.org/abs/2011.10566v1
Date: Fri, 20 Nov 2020 18:59:33 GMT
Title: Exploring Simple Siamese Representation Learning
Authors: Xinlei Chen and Kaiming He
Abstract summary: We show that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders. Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing.
Score: 68.37628268182185
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Siamese networks have become a common structure in various recent models for unsupervised visual representation learning. These models maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions. In this paper, we report surprising empirical results that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders. Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing. We provide a hypothesis on the implication of stop-gradient, and further show proof-of-concept experiments verifying it. Our "SimSiam" method achieves competitive results on ImageNet and downstream tasks. We hope this simple baseline will motivate people to rethink the roles of Siamese architectures for unsupervised representation learning. Code will be made available.

Related papers

Simplifying DINO via Coding Rate Regularization [74.88963795406733]
DINO and DINOv2 are two model families being widely used to learn representations from unlabeled imagery data at large scales. This work highlights the potential of using simplifying design principles to improve the empirical practice of deep learning.
arXiv Detail & Related papers (2025-02-14T18:58:04Z)
Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance. Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning. Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z)
Causal Triplet: An Open Challenge for Intervention-centric Causal Representation Learning [98.78136504619539]
Causal Triplet is a causal representation learning benchmark featuring visually more complex scenes. We show that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts.
arXiv Detail & Related papers (2023-01-12T17:43:38Z)
Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework [43.76337849044254]
Self-supervised learning has shown its great potential to extract powerful visual representations without human annotations. Various works are proposed to deal with self-supervised learning from different perspectives. We propose UniGrad, a simple but effective gradient form for self-supervised learning.
arXiv Detail & Related papers (2021-12-09T18:59:57Z)
The Dimpled Manifold Model of Adversarial Examples in Machine Learning [6.6690527698171165]
In this paper we introduce a new conceptual framework which provides a simple explanation for why adversarial examples exist. In the last part of the paper we describe the results of numerous experiments which strongly support this new model.
arXiv Detail & Related papers (2021-06-18T14:32:55Z)
Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training [39.137793683411424]
We introduce the textitLayer-Peeled Model, a non-yet analytically tractable optimization program. We show that the model inherits many characteristics of well-trained networks, thereby offering an effective tool for explaining and predicting common empirical patterns of deep learning training. In particular, we show that the model reveals a hitherto unknown phenomenon that we term textitMinority Collapse, which fundamentally limits the performance of deep learning models on the minority classes.
arXiv Detail & Related papers (2021-01-29T17:37:17Z)
A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation [63.042651834453544]
We show that the unsupervised learning of disentangled representations is impossible without inductive biases on both the models and the data. We observe that while the different methods successfully enforce properties "encouraged" by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision.
arXiv Detail & Related papers (2020-10-27T10:17:15Z)
Tackling Occlusion in Siamese Tracking with Structured Dropouts [42.303946665229965]
Occlusion is one of the most difficult challenges in object tracking to model. We present structured dropout to mimick the change in latent codes under occlusion. Experiments on several tracking benchmarks show the benefits of structured dropouts.
arXiv Detail & Related papers (2020-06-30T07:09:33Z)
Unsupervised Landmark Learning from Unpaired Data [117.81440795184587]
Recent attempts for unsupervised landmark learning leverage synthesized image pairs that are similar in appearance but different in poses. We propose a cross-image cycle consistency framework which applies the swapping-reconstruction strategy twice to obtain the final supervision. Our proposed framework is shown to outperform strong baselines by a large margin.
arXiv Detail & Related papers (2020-06-29T13:57:20Z)
Weakly-Supervised Disentanglement Without Compromises [53.55580957483103]
Intelligent agents should be able to learn useful representations by observing changes in their environment. We model such observations as pairs of non-i.i.d. images sharing at least one of the underlying factors of variation. We show that only knowing how many factors have changed, but not which ones, is sufficient to learn disentangled representations.
arXiv Detail & Related papers (2020-02-07T16:39:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.