TokenMixup: Efficient Attention-guided Token-level Data Augmentation for
Transformers
- URL: http://arxiv.org/abs/2210.07562v1
- Date: Fri, 14 Oct 2022 06:36:31 GMT
- Title: TokenMixup: Efficient Attention-guided Token-level Data Augmentation for
Transformers
- Authors: Hyeong Kyu Choi, Joonmyung Choi, Hyunwoo J. Kim
- Abstract summary: TokenMixup is an efficient attention-guided token-level data augmentation method.
A variant of TokenMixup mixes tokens within a single instance, thereby enabling multi-scale feature augmentation.
Experiments show that our methods significantly improve the baseline models' performance on CIFAR and ImageNet-1K.
- Score: 8.099977107670917
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mixup is a commonly adopted data augmentation technique for image
classification. Recent advances in mixup methods primarily focus on mixing
based on saliency. However, many saliency detectors require intense computation
and are especially burdensome for parameter-heavy transformer models. To this
end, we propose TokenMixup, an efficient attention-guided token-level data
augmentation method that aims to maximize the saliency of a mixed set of
tokens. TokenMixup provides x15 faster saliency-aware data augmentation
compared to gradient-based methods. Moreover, we introduce a variant of
TokenMixup which mixes tokens within a single instance, thereby enabling
multi-scale feature augmentation. Experiments show that our methods
significantly improve the baseline models' performance on CIFAR and
ImageNet-1K, while being more efficient than previous methods. We also reach
state-of-the-art performance on CIFAR-100 among from-scratch transformer
models. Code is available at https://github.com/mlvlab/TokenMixup.
Related papers
- Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding [54.532578213126065]
Most document understanding methods preserve all tokens within sub-images and treat them equally.
This neglects their different informativeness and leads to a significant increase in the number of image tokens.
We propose Token-level Correlation-guided Compression, a parameter-free and plug-and-play methodology to optimize token processing.
arXiv Detail & Related papers (2024-07-19T16:11:15Z) - TransformMix: Learning Transformation and Mixing Strategies from Data [20.79680733590554]
We propose an automated approach, TransformMix, to learn better transformation and mixing augmentation strategies from data.
We demonstrate the effectiveness of TransformMix on multiple datasets in transfer learning, classification, object detection, and knowledge distillation settings.
arXiv Detail & Related papers (2024-03-19T04:36:41Z) - Adversarial AutoMixup [50.1874436169571]
We propose AdAutomixup, an adversarial automatic mixup augmentation approach.
It generates challenging samples to train a robust classifier for image classification.
Our approach outperforms the state of the art in various classification scenarios.
arXiv Detail & Related papers (2023-12-19T08:55:00Z) - MixupE: Understanding and Improving Mixup from Directional Derivative
Perspective [86.06981860668424]
We propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup.
Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures.
arXiv Detail & Related papers (2022-12-27T07:03:52Z) - Token-Label Alignment for Vision Transformers [93.58540411138164]
Data mixing strategies (e.g., CutMix) have shown the ability to greatly improve the performance of convolutional neural networks (CNNs)
We identify a token fluctuation phenomenon that has suppressed the potential of data mixing strategies.
We propose a token-label alignment (TL-Align) method to trace the correspondence between transformed tokens and the original tokens to maintain a label for each token.
arXiv Detail & Related papers (2022-10-12T17:54:32Z) - DoubleMix: Simple Interpolation-Based Data Augmentation for Text
Classification [56.817386699291305]
This paper proposes a simple yet effective data augmentation approach termed DoubleMix.
DoubleMix first generates several perturbed samples for each training data.
It then uses the perturbed data and original data to carry out a two-step in the hidden space of neural models.
arXiv Detail & Related papers (2022-09-12T15:01:04Z) - TokenMix: Rethinking Image Mixing for Data Augmentation in Vision
Transformers [36.630476419392046]
CutMix is a popular augmentation technique commonly used for training modern convolutional and transformer vision networks.
We propose a novel data augmentation technique TokenMix to improve the performance of vision transformers.
arXiv Detail & Related papers (2022-07-18T07:08:29Z) - MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of
Hierarchical Vision Transformers [35.26148770111607]
Mixed and Masked AutoEncoder (MixMAE) is a simple but efficient pretraining method that is applicable to various hierarchical Vision Transformers.
This paper explores using Swin Transformer with a large window size and scales up to huge model size (to reach 600M parameters). Notably, MixMAE with Swin-B/W14 achieves 85.1% top-1 accuracy on ImageNet-1K by pretraining for 600 epochs.
arXiv Detail & Related papers (2022-05-26T04:00:42Z) - Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels.
In this paper, we explore how to apply mixup to natural language processing tasks.
We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.