Related papers: Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where

Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where

URL: http://arxiv.org/abs/2309.12757v2
Date: Sat, 8 Jun 2024 05:42:53 GMT
Title: Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where
Authors: Zhi-Yi Chin, Chieh-Ming Jiang, Ching-Chun Huang, Pin-Yu Chen, Wei-Chen Chiu,
Abstract summary: We aim to alleviate the burden of including masking operation into the contrastive-learning framework for convolutional neural networks. We propose to explicitly take the saliency constraint into consideration in which the masked regions are more evenly distributed among the foreground and background.
Score: 63.61248884015162
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While image data starts to enjoy the simple-but-effective self-supervised learning scheme built upon masking and self-reconstruction objective thanks to the introduction of tokenization procedure and vision transformer backbone, convolutional neural networks as another important and widely-adopted architecture for image data, though having contrastive-learning techniques to drive the self-supervised learning, still face the difficulty of leveraging such straightforward and general masking operation to benefit their learning process significantly. In this work, we aim to alleviate the burden of including masking operation into the contrastive-learning framework for convolutional neural networks as an extra augmentation method. In addition to the additive but unwanted edges (between masked and unmasked regions) as well as other adverse effects caused by the masking operations for ConvNets, which have been discussed by prior works, we particularly identify the potential problem where for one view in a contrastive sample-pair the randomly-sampled masking regions could be overly concentrated on important/salient objects thus resulting in misleading contrastiveness to the other view. To this end, we propose to explicitly take the saliency constraint into consideration in which the masked regions are more evenly distributed among the foreground and background for realizing the masking-based augmentation. Moreover, we introduce hard negative samples by masking larger regions of salient patches in an input image. Extensive experiments conducted on various datasets, contrastive learning mechanisms, and downstream tasks well verify the efficacy as well as the superior performance of our proposed method with respect to several state-of-the-art baselines.

Related papers

Rethinking Random Masking in Self Distillation on ViT [0.0]
This study focuses on the role of random masking in the self-distillation setting, focusing on the DINO framework.<n>Specifically, we apply random masking exclusively to the student's global view, while preserving the student's local views and the teacher's global view in their original, unmasked forms.<n>We evaluate our approach using DINO-Tiny on the mini-ImageNet dataset and show that random masking under this asymmetric setup yields more robust and fine-grained attention maps, ultimately enhancing downstream performance.
arXiv Detail & Related papers (2025-06-12T11:19:07Z)
Understanding Masked Autoencoders From a Local Contrastive Perspective [80.57196495601826]
Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies. We introduce a new empirical framework, called Local Contrastive MAE, to analyze both reconstructive and contrastive aspects of MAE.
arXiv Detail & Related papers (2023-10-03T12:08:15Z)
Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks [64.67735676127208]
Text-to-image diffusion models have shown great potential for benefiting image recognition. Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images. We introduce customized solutions by fully exploiting the aforementioned free attention masks.
arXiv Detail & Related papers (2023-08-13T10:07:46Z)
Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data. We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process. In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z)
Improving self-supervised representation learning via sequential adversarial masking [12.176299580413097]
Masking-based pretext tasks extend beyond NLP, serving as useful pretraining objectives in computer vision. We propose a new framework that generates masks in a sequential fashion with different constraints on the adversary.
arXiv Detail & Related papers (2022-12-16T04:25:43Z)
Masked Siamese ConvNets [17.337143119620755]
Self-supervised learning has shown superior performances over supervised methods on various vision benchmarks. Masked siamese networks require particular inductive bias and practically only work well with Vision Transformers. This work empirically studies the problems behind masked siamese networks with ConvNets.
arXiv Detail & Related papers (2022-06-15T17:52:23Z)
What You See is What You Classify: Black Box Attributions [61.998683569022006]
We train a deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum. Unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks. We show that our attributions are superior to established methods both visually and quantitatively.
arXiv Detail & Related papers (2022-05-23T12:30:04Z)
View Blind-spot as Inpainting: Self-Supervised Denoising with Mask Guided Residual Convolution [2.179313476241343]
We propose a novel Mask Guided Residual Convolution (MGRConv) into common convolutional neural networks. Our MGRConv can be regarded as soft partial convolution and find a trade-off among partial convolution, learnable attention maps, and gated convolution. Experiments show that our proposed plug-and-play MGRConv can assist blind-spot based denoising network to reach promising results.
arXiv Detail & Related papers (2021-09-10T16:10:08Z)
Face Anti-Spoofing Via Disentangled Representation Learning [90.90512800361742]
Face anti-spoofing is crucial to security of face recognition systems. We propose a novel perspective of face anti-spoofing that disentangles the liveness features and content features from images.
arXiv Detail & Related papers (2020-08-19T03:54:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.