Masked Siamese ConvNets
- URL: http://arxiv.org/abs/2206.07700v1
- Date: Wed, 15 Jun 2022 17:52:23 GMT
- Title: Masked Siamese ConvNets
- Authors: Li Jing, Jiachen Zhu, Yann LeCun
- Abstract summary: Self-supervised learning has shown superior performances over supervised methods on various vision benchmarks.
Masked siamese networks require particular inductive bias and practically only work well with Vision Transformers.
This work empirically studies the problems behind masked siamese networks with ConvNets.
- Score: 17.337143119620755
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Self-supervised learning has shown superior performances over supervised
methods on various vision benchmarks. The siamese network, which encourages
embeddings to be invariant to distortions, is one of the most successful
self-supervised visual representation learning approaches. Among all the
augmentation methods, masking is the most general and straightforward method
that has the potential to be applied to all kinds of input and requires the
least amount of domain knowledge. However, masked siamese networks require
particular inductive bias and practically only work well with Vision
Transformers. This work empirically studies the problems behind masked siamese
networks with ConvNets. We propose several empirical designs to overcome these
problems gradually. Our method performs competitively on low-shot image
classification and outperforms previous methods on object detection benchmarks.
We discuss several remaining issues and hope this work can provide useful data
points for future general-purpose self-supervised learning.
Related papers
- Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning [21.49630640829186]
In this paper, we focus on the complex yet highly realistic task of incomplete multi-view weak multi-label learning.
We propose a masked two-channel decoupling framework based on deep neural networks to solve this problem.
Our model is fully adaptable to arbitrary view and label absences while also performing well on the ideal full data.
arXiv Detail & Related papers (2024-04-26T11:39:50Z) - Appearance Debiased Gaze Estimation via Stochastic Subject-Wise
Adversarial Learning [33.55397868171977]
Appearance-based gaze estimation has been attracting attention in computer vision, and remarkable improvements have been achieved using various deep learning techniques.
We propose a novel framework: subject-wise gaZE learning (SAZE), which trains a network to generalize the appearance of subjects.
Our experimental results verify the robustness of the method in that it yields state-of-the-art performance, achieving 3.89 and 4.42 on the MPIIGaze and EyeDiap datasets, respectively.
arXiv Detail & Related papers (2024-01-25T00:23:21Z) - Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where [63.61248884015162]
We aim to alleviate the burden of including masking operation into the contrastive-learning framework for convolutional neural networks.
We propose to explicitly take the saliency constraint into consideration in which the masked regions are more evenly distributed among the foreground and background.
arXiv Detail & Related papers (2023-09-22T09:58:38Z) - Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images
with Free Attention Masks [64.67735676127208]
Text-to-image diffusion models have shown great potential for benefiting image recognition.
Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images.
We introduce customized solutions by fully exploiting the aforementioned free attention masks.
arXiv Detail & Related papers (2023-08-13T10:07:46Z) - When Measures are Unreliable: Imperceptible Adversarial Perturbations
toward Top-$k$ Multi-Label Learning [83.8758881342346]
A novel loss function is devised to generate adversarial perturbations that could achieve both visual and measure imperceptibility.
Experiments on large-scale benchmark datasets demonstrate the superiority of our proposed method in attacking the top-$k$ multi-label systems.
arXiv Detail & Related papers (2023-07-27T13:18:47Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - MixMask: Revisiting Masking Strategy for Siamese ConvNets [24.20212182301359]
We propose a filling-based masking strategy called MixMask to prevent information incompleteness caused by the randomly erased regions in an image.
Our proposed framework achieves superior accuracy on linear probing, semi-supervised, and supervised finetuning, outperforming the state-of-the-art MSCN by a significant margin.
arXiv Detail & Related papers (2022-10-20T17:54:03Z) - Evaluating the Label Efficiency of Contrastive Self-Supervised Learning
for Multi-Resolution Satellite Imagery [0.0]
Self-supervised learning has been applied in the remote sensing domain to exploit readily-available unlabeled data.
In this paper, we study self-supervised visual representation learning through the lens of label efficiency.
arXiv Detail & Related papers (2022-10-13T06:54:13Z) - What You See is What You Classify: Black Box Attributions [61.998683569022006]
We train a deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum.
Unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks.
We show that our attributions are superior to established methods both visually and quantitatively.
arXiv Detail & Related papers (2022-05-23T12:30:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.