SMMix: Self-Motivated Image Mixing for Vision Transformers
- URL: http://arxiv.org/abs/2212.12977v1
- Date: Mon, 26 Dec 2022 00:19:39 GMT
- Title: SMMix: Self-Motivated Image Mixing for Vision Transformers
- Authors: Mengzhao Chen, Mingbao Lin, ZhiHang Lin, Yuxin Zhang, Fei Chao,
Rongrong Ji
- Abstract summary: CutMix is a vital augmentation strategy that determines the performance and generalization ability of vision transformers (ViTs)
Existing CutMix variants tackle this problem by generating more consistent mixed images or more precise mixed labels.
We propose an efficient and effective Self-Motivated image Mixing method (SMMix) which motivates both image and label enhancement by the model under training itself.
- Score: 65.809376136455
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: CutMix is a vital augmentation strategy that determines the performance and
generalization ability of vision transformers (ViTs). However, the
inconsistency between the mixed images and the corresponding labels harms its
efficacy. Existing CutMix variants tackle this problem by generating more
consistent mixed images or more precise mixed labels, but inevitably introduce
heavy training overhead or require extra information, undermining ease of use.
To this end, we propose an efficient and effective Self-Motivated image Mixing
method (SMMix), which motivates both image and label enhancement by the model
under training itself. Specifically, we propose a max-min attention region
mixing approach that enriches the attention-focused objects in the mixed
images. Then, we introduce a fine-grained label assignment technique that
co-trains the output tokens of mixed images with fine-grained supervision.
Moreover, we devise a novel feature consistency constraint to align features
from mixed and unmixed images. Due to the subtle designs of the self-motivated
paradigm, our SMMix is significant in its smaller training overhead and better
performance than other CutMix variants. In particular, SMMix improves the
accuracy of DeiT-T/S, CaiT-XXS-24/36, and PVT-T/S/M/L by more than +1% on
ImageNet-1k. The generalization capability of our method is also demonstrated
on downstream tasks and out-of-distribution datasets. Code of this project is
available at https://github.com/ChenMnZ/SMMix.
Related papers
- SUMix: Mixup with Semantic and Uncertain Information [41.99721365685618]
Mixup data augmentation approaches have been applied for various tasks of deep learning.
We propose a novel approach named SUMix to learn the mixing ratio as well as the uncertainty for the mixed samples during the training process.
arXiv Detail & Related papers (2024-07-10T16:25:26Z) - Rethinking Mixup for Improving the Adversarial Transferability [6.2867306093287905]
We propose a new input transformation-based attack called Mixing the Image but Separating the gradienT (MIST)
MIST randomly mixes the input image with a randomly shifted image and separates the gradient of each loss item for each mixed image.
Experiments on the ImageNet dataset demonstrate that MIST outperforms existing SOTA input transformation-based attacks.
arXiv Detail & Related papers (2023-11-28T03:10:44Z) - SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for
Multi-label Image Classification [46.8141860303439]
We introduce a simple but effective augmentation strategy for multi-label image classification, namely SpliceMix.
The "splice" in our method is two-fold: 1) Each mixed image is a splice of several downsampled images in the form of a grid, where the semantics of images attending to mixing are blended without object deficiencies for alleviating co-occurred bias; 2) We splice mixed images and the original mini-batch to form a new SpliceMixed mini-batch, which allows an image with different scales to contribute to training together.
arXiv Detail & Related papers (2023-11-26T05:45:27Z) - MixPro: Data Augmentation with MaskMix and Progressive Attention
Labeling for Vision Transformer [17.012278767127967]
We propose MaskMix and Progressive Attention Labeling in image and label space.
From the perspective of image space, we design MaskMix, which mixes two images based on a patch-like grid mask.
From the perspective of label space, we design PAL, which utilizes a progressive factor to dynamically re-weight the attention weights of the mixed attention label.
arXiv Detail & Related papers (2023-04-24T12:38:09Z) - Mixed Autoencoder for Self-supervised Visual Representation Learning [95.98114940999653]
Masked Autoencoder (MAE) has demonstrated superior performance on various vision tasks via randomly masking image patches and reconstruction.
This paper studies the prevailing mixing augmentation for MAE.
arXiv Detail & Related papers (2023-03-30T05:19:43Z) - OAMixer: Object-aware Mixing Layer for Vision Transformers [73.10651373341933]
We propose OAMixer, which calibrates the patch mixing layers of patch-based models based on the object labels.
By learning an object-centric representation, we demonstrate that OAMixer improves the classification accuracy and background robustness of various patch-based models.
arXiv Detail & Related papers (2022-12-13T14:14:48Z) - ResizeMix: Mixing Data with Preserved Object Information and True Labels [57.00554495298033]
We study the importance of the saliency information for mixing data, and find that the saliency information is not so necessary for promoting the augmentation performance.
We propose a more effective but very easily implemented method, namely ResizeMix.
arXiv Detail & Related papers (2020-12-21T03:43:13Z) - SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained
Data [124.95585891086894]
Proposal is called Semantically Proportional Mixing (SnapMix)
It exploits class activation map (CAM) to lessen the label noise in augmenting fine-grained data.
Our method consistently outperforms existing mixed-based approaches.
arXiv Detail & Related papers (2020-12-09T03:37:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.