WaveMix: Resource-efficient Token Mixing for Images
- URL: http://arxiv.org/abs/2203.03689v1
- Date: Mon, 7 Mar 2022 20:15:17 GMT
- Title: WaveMix: Resource-efficient Token Mixing for Images
- Authors: Pranav Jeevan and Amit Sethi
- Abstract summary: We present WaveMix as an alternative neural architecture that uses a multi-scale 2D discrete wavelet transform (DWT) for spatial token mixing.
WaveMix has achieved State-of-the-art (SOTA) results in EMNIST Byclass and EMNIST Balanced datasets.
- Score: 2.7188347260210466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although certain vision transformer (ViT) and CNN architectures generalize
well on vision tasks, it is often impractical to use them on green, edge, or
desktop computing due to their computational requirements for training and even
testing. We present WaveMix as an alternative neural architecture that uses a
multi-scale 2D discrete wavelet transform (DWT) for spatial token mixing.
Unlike ViTs, WaveMix neither unrolls the image nor requires self-attention of
quadratic complexity. Additionally, DWT introduces another inductive bias --
besides convolutional filtering -- to utilize the 2D structure of an image to
improve generalization. The multi-scale nature of the DWT also reduces the
requirement for a deeper architecture compared to the CNNs, as the latter
relies on pooling for partial spatial mixing. WaveMix models show
generalization that is competitive with ViTs, CNNs, and token mixers on several
datasets while requiring lower GPU RAM (training and testing), number of
computations, and storage. WaveMix have achieved State-of-the-art (SOTA)
results in EMNIST Byclass and EMNIST Balanced datasets.
Related papers
- WaveMixSR-V2: Enhancing Super-resolution with Higher Efficiency [4.093503153499691]
We present an enhanced version of the WaveMixSR architecture by replacing the traditional convolution layer with a pixel shuffle operation.
Our experiments demonstrate that our enhanced model -- WaveMixSR-V2 -- outperforms other architectures in multiple super-resolution tasks.
arXiv Detail & Related papers (2024-09-16T04:16:52Z) - TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic
Token Mixer for Visual Recognition [71.6546914957701]
We propose a lightweight Dual Dynamic Token Mixer (D-Mixer) that aggregates global information and local details in an input-dependent way.
We use D-Mixer as the basic building block to design TransXNet, a novel hybrid CNN-Transformer vision backbone network.
In the ImageNet-1K image classification task, TransXNet-T surpasses Swin-T by 0.3% in top-1 accuracy while requiring less than half of the computational cost.
arXiv Detail & Related papers (2023-10-30T09:35:56Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models [89.76587063609806]
We study the denoising diffusion probabilistic model (DDPM) in wavelet space, instead of pixel space, for visual synthesis.
By explicitly modeling the wavelet signals, we find our model is able to generate images with higher quality on several datasets.
arXiv Detail & Related papers (2023-07-27T06:53:16Z) - WaveMixSR: A Resource-efficient Neural Network for Image
Super-resolution [2.0477182014909205]
We propose a new neural network -- WaveMixSR -- for image super-resolution based on WaveMix architecture.
WaveMixSR achieves competitive performance in all datasets and reaches state-of-the-art performance in the BSD100 dataset on multiple super-resolution tasks.
arXiv Detail & Related papers (2023-07-01T21:25:03Z) - WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting [2.3014300466616078]
This paper diverges from vision transformers by using a computationally-efficient WaveMix-based fully convolutional architecture -- WavePaint.
It uses a 2D-discrete wavelet transform (DWT) for spatial and multi-resolution token-mixing along with convolutional layers.
Our model even outperforms current GAN-based architectures in CelebA-HQ dataset without using an adversarially trainable discriminator.
arXiv Detail & Related papers (2023-07-01T18:41:34Z) - FFT-based Dynamic Token Mixer for Vision [5.439020425819001]
We propose a novel token-mixer called Dynamic Filter and novel image recognition models, DFFormer and CDFFormer.
Our results indicate that Dynamic Filter is one of the token-mixer options that should be seriously considered.
arXiv Detail & Related papers (2023-03-07T14:38:28Z) - Multi-scale Transformer Network with Edge-aware Pre-training for
Cross-Modality MR Image Synthesis [52.41439725865149]
Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones.
Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model.
We propose a Multi-scale Transformer Network (MT-Net) with edge-aware pre-training for cross-modality MR image synthesis.
arXiv Detail & Related papers (2022-12-02T11:40:40Z) - WaveMix: A Resource-efficient Neural Network for Image Analysis [3.4927288761640565]
WaveMix is resource-efficient and yet generalizable and scalable.
Networks achieve comparable or better accuracy than the state-of-the-art convolutional neural networks.
WaveMix establishes new benchmarks for segmentation on Cityscapes.
arXiv Detail & Related papers (2022-05-28T09:08:50Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z) - Wavelet Integrated CNNs for Noise-Robust Image Classification [51.18193090255933]
We enhance CNNs by replacing max-pooling, strided-convolution, and average-pooling with Discrete Wavelet Transform (DWT)
WaveCNets, the wavelet integrated versions of VGG, ResNets, and DenseNet, achieve higher accuracy and better noise-robustness than their vanilla versions.
arXiv Detail & Related papers (2020-05-07T09:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.