Extreme Masking for Learning Instance and Distributed Visual
Representations
- URL: http://arxiv.org/abs/2206.04667v1
- Date: Thu, 9 Jun 2022 17:59:43 GMT
- Title: Extreme Masking for Learning Instance and Distributed Visual
Representations
- Authors: Zhirong Wu, Zihang Lai, Xiao Sun, Stephen Lin
- Abstract summary: The paper presents a scalable approach for learning distributed representations over individual tokens and a holistic instance representation simultaneously.
We use self-attention blocks to represent distributed tokens, followed by cross-attention blocks to aggregate the holistic instance.
Our model, named ExtreMA, follows the plain BYOL approach where the instance representation from the unmasked subset is trained to predict that from the intact input.
- Score: 50.152264456036114
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The paper presents a scalable approach for learning distributed
representations over individual tokens and a holistic instance representation
simultaneously. We use self-attention blocks to represent distributed tokens,
followed by cross-attention blocks to aggregate the holistic instance. The core
of the approach is the use of extremely large token masking (75%-90%) as the
data augmentation for supervision. Our model, named ExtreMA, follows the plain
BYOL approach where the instance representation from the unmasked subset is
trained to predict that from the intact input. Learning requires the model to
capture informative variations in an instance, instead of encouraging
invariances. The paper makes three contributions: 1) Random masking is a strong
and computationally efficient data augmentation for learning generalizable
attention representations. 2) With multiple sampling per instance, extreme
masking greatly speeds up learning and hungers for more data. 3) Distributed
representations can be learned from the instance supervision alone, unlike
per-token supervisions in masked modeling.
Related papers
- Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model
for Few-Shot Instance Segmentation [31.648523213206595]
Few-shot instance segmentation extends the few-shot learning paradigm to the instance segmentation task.
Conventional approaches have attempted to address the task via prototype learning, known as point estimation.
We propose a novel approach, dubbed MaskDiff, which models the underlying conditional distribution of a binary mask.
arXiv Detail & Related papers (2023-03-09T08:24:02Z) - Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning.
We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z) - Exploring Target Representations for Masked Autoencoders [78.57196600585462]
We show that a careful choice of the target representation is unnecessary for learning good representations.
We propose a multi-stage masked distillation pipeline and use a randomly model as the teacher.
A proposed method to perform masked knowledge distillation with bootstrapped teachers (dBOT) outperforms previous self-supervised methods by nontrivial margins.
arXiv Detail & Related papers (2022-09-08T16:55:19Z) - What You See is What You Classify: Black Box Attributions [61.998683569022006]
We train a deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum.
Unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks.
We show that our attributions are superior to established methods both visually and quantitatively.
arXiv Detail & Related papers (2022-05-23T12:30:04Z) - Variance-reduced Language Pretraining via a Mask Proposal Network [5.819397109258169]
Self-supervised learning, a.k.a., pretraining, is important in natural language processing.
In this paper, we tackle the problem from the view of gradient variance reduction.
To improve efficiency, we introduced a MAsk Network (MAPNet), which approximates the optimal mask proposal distribution.
arXiv Detail & Related papers (2020-08-12T14:12:32Z) - PointINS: Point-based Instance Segmentation [117.38579097923052]
Mask representation in instance segmentation with Point-of-Interest (PoI) features is challenging because learning a high-dimensional mask feature for each instance requires a heavy computing burden.
We propose an instance-aware convolution, which decomposes this mask representation learning task into two tractable modules.
Along with instance-aware convolution, we propose PointINS, a simple and practical instance segmentation approach.
arXiv Detail & Related papers (2020-03-13T08:24:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.