Related papers: TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers

TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers

URL: http://arxiv.org/abs/2210.07562v1
Date: Fri, 14 Oct 2022 06:36:31 GMT
Title: TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers
Authors: Hyeong Kyu Choi, Joonmyung Choi, Hyunwoo J. Kim
Abstract summary: TokenMixup is an efficient attention-guided token-level data augmentation method. A variant of TokenMixup mixes tokens within a single instance, thereby enabling multi-scale feature augmentation. Experiments show that our methods significantly improve the baseline models' performance on CIFAR and ImageNet-1K.
Score: 8.099977107670917
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mixup is a commonly adopted data augmentation technique for image classification. Recent advances in mixup methods primarily focus on mixing based on saliency. However, many saliency detectors require intense computation and are especially burdensome for parameter-heavy transformer models. To this end, we propose TokenMixup, an efficient attention-guided token-level data augmentation method that aims to maximize the saliency of a mixed set of tokens. TokenMixup provides x15 faster saliency-aware data augmentation compared to gradient-based methods. Moreover, we introduce a variant of TokenMixup which mixes tokens within a single instance, thereby enabling multi-scale feature augmentation. Experiments show that our methods significantly improve the baseline models' performance on CIFAR and ImageNet-1K, while being more efficient than previous methods. We also reach state-of-the-art performance on CIFAR-100 among from-scratch transformer models. Code is available at https://github.com/mlvlab/TokenMixup.

Related papers

Efficient Token Compression for Vision Transformer with Spatial Information Preserved [59.79302182800274]
Token compression is essential for reducing the computational and memory requirements of transformer models. We propose an efficient and hardware-compatible token compression method called Prune and Merge.
arXiv Detail & Related papers (2025-03-30T14:23:18Z)
Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding [54.532578213126065]
Most document understanding methods preserve all tokens within sub-images and treat them equally. This neglects their different informativeness and leads to a significant increase in the number of image tokens. We propose Token-level Correlation-guided Compression, a parameter-free and plug-and-play methodology to optimize token processing.
arXiv Detail & Related papers (2024-07-19T16:11:15Z)
TransformMix: Learning Transformation and Mixing Strategies from Data [20.79680733590554]
We propose an automated approach, TransformMix, to learn better transformation and mixing augmentation strategies from data. We demonstrate the effectiveness of TransformMix on multiple datasets in transfer learning, classification, object detection, and knowledge distillation settings.
arXiv Detail & Related papers (2024-03-19T04:36:41Z)
Adversarial AutoMixup [50.1874436169571]
We propose AdAutomixup, an adversarial automatic mixup augmentation approach. It generates challenging samples to train a robust classifier for image classification. Our approach outperforms the state of the art in various classification scenarios.
arXiv Detail & Related papers (2023-12-19T08:55:00Z)
MixupE: Understanding and Improving Mixup from Directional Derivative Perspective [86.06981860668424]
We propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures.
arXiv Detail & Related papers (2022-12-27T07:03:52Z)
Token-Label Alignment for Vision Transformers [93.58540411138164]
Data mixing strategies (e.g., CutMix) have shown the ability to greatly improve the performance of convolutional neural networks (CNNs) We identify a token fluctuation phenomenon that has suppressed the potential of data mixing strategies. We propose a token-label alignment (TL-Align) method to trace the correspondence between transformed tokens and the original tokens to maintain a label for each token.
arXiv Detail & Related papers (2022-10-12T17:54:32Z)
DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification [56.817386699291305]
This paper proposes a simple yet effective data augmentation approach termed DoubleMix. DoubleMix first generates several perturbed samples for each training data. It then uses the perturbed data and original data to carry out a two-step in the hidden space of neural models.
arXiv Detail & Related papers (2022-09-12T15:01:04Z)
TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers [36.630476419392046]
CutMix is a popular augmentation technique commonly used for training modern convolutional and transformer vision networks. We propose a novel data augmentation technique TokenMix to improve the performance of vision transformers.
arXiv Detail & Related papers (2022-07-18T07:08:29Z)
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers [35.26148770111607]
Mixed and Masked AutoEncoder (MixMAE) is a simple but efficient pretraining method that is applicable to various hierarchical Vision Transformers. This paper explores using Swin Transformer with a large window size and scales up to huge model size (to reach 600M parameters). Notably, MixMAE with Swin-B/W14 achieves 85.1% top-1 accuracy on ImageNet-1K by pretraining for 600 epochs.
arXiv Detail & Related papers (2022-05-26T04:00:42Z)
Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels. In this paper, we explore how to apply mixup to natural language processing tasks. We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.