MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image
Translation
- URL: http://arxiv.org/abs/2105.14110v1
- Date: Fri, 28 May 2021 21:12:52 GMT
- Title: MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image
Translation
- Authors: George Cazenavette, Manuel Ladron De Guevara
- Abstract summary: We propose a new unpaired image-to-image translation model called MixerGAN.
We show that MixerGAN achieves competitive results when compared to prior convolutional-based methods.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While attention-based transformer networks achieve unparalleled success in
nearly all language tasks, the large number of tokens coupled with the
quadratic activation memory usage makes them prohibitive for visual tasks. As
such, while language-to-language translation has been revolutionized by the
transformer model, convolutional networks remain the de facto solution for
image-to-image translation. The recently proposed MLP-Mixer architecture
alleviates some of the speed and memory issues associated with attention-based
networks while still retaining the long-range connections that make transformer
models desirable. Leveraging this efficient alternative to self-attention, we
propose a new unpaired image-to-image translation model called MixerGAN: a
simpler MLP-based architecture that considers long-distance relationships
between pixels without the need for expensive attention mechanisms.
Quantitative and qualitative analysis shows that MixerGAN achieves competitive
results when compared to prior convolutional-based methods.
Related papers
- Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring [25.36888929483233]
We propose a multi-scale network based on single-input and multiple-outputs(SIMO) for motion deblurring.
We combine the characteristics of real-world trajectories with a learnable wavelet transform module to focus on the directional continuity and frequency features of the step-by-step transitions between blurred images to sharp images.
arXiv Detail & Related papers (2023-12-29T02:59:40Z) - Smooth image-to-image translations with latent space interpolations [64.8170758294427]
Multi-domain image-to-image (I2I) translations can transform a source image according to the style of a target domain.
We show that our regularization techniques can improve the state-of-the-art I2I translations by a large margin.
arXiv Detail & Related papers (2022-10-03T11:57:30Z) - PI-Trans: Parallel-ConvMLP and Implicit-Transformation Based GAN for
Cross-View Image Translation [84.97160975101718]
We propose a novel generative adversarial network, PI-Trans, which consists of a novel Parallel-ConvMLP module and an Implicit Transformation module at multiple semantic levels.
PI-Trans achieves the best qualitative and quantitative performance by a large margin compared to the state-of-the-art methods on two challenging datasets.
arXiv Detail & Related papers (2022-07-09T10:35:44Z) - ITTR: Unpaired Image-to-Image Translation with Transformers [34.118637795470875]
We propose an effective and efficient architecture for unpaired Image-to-Image Translation with Transformers (ITTR)
ITTR has two main designs: 1) hybrid perception block (HPB) for token mixing from different fields receptive to utilize global semantics; 2) dual pruned self-attention (DPSA) to sharply reduce the computational complexity.
Our ITTR outperforms the state-of-the-arts for unpaired image-to-image translation on six benchmark datasets.
arXiv Detail & Related papers (2022-03-30T02:46:12Z) - Image-to-Image MLP-mixer for Image Reconstruction [23.036592718421105]
We show that a simple network based on the multi-layer perceptron (MLP)-mixer enables state-of-the art image reconstruction performance without convolutions.
The image-to-image-mixer is based exclusively ons operating on linearly-transformed image patches.
It also outperforms the vision transformer for image reconstruction and classical un-trained methods such as BM3D.
arXiv Detail & Related papers (2022-02-04T08:36:34Z) - MAXIM: Multi-Axis MLP for Image Processing [19.192826213493838]
We present a multi-axis based architecture, called MAXIM, that can serve as an efficient general-purpose vision backbone for image processing tasks.
MAXIM uses a UNet-shaped hierarchical structure and supports long-range interactions enabled by spatially-gateds.
Results show that the proposed MAXIM model achieves state-of-the-art performance on more than ten benchmarks across a range of image processing tasks.
arXiv Detail & Related papers (2022-01-09T09:59:32Z) - Long-Short Transformer: Efficient Transformers for Language and Vision [97.2850205384295]
Long-Short Transformer (Transformer-LS) is an efficient self-attention mechanism for modeling long sequences with linear complexity for both language and vision tasks.
It aggregates a novel long-range attention with dynamic projection to model distant correlations and a short-term attention to capture fine-grained local correlations.
Our method outperforms the state-of-the-art models on multiple tasks in language and vision domains, including the Long Range Arena benchmark, autoregressive language modeling, and ImageNet classification.
arXiv Detail & Related papers (2021-07-05T18:00:14Z) - XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens.
The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z) - Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences.
The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z) - Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations.
In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.