Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation
- URL: http://arxiv.org/abs/2110.10183v1
- Date: Tue, 19 Oct 2021 18:03:30 GMT
- Title: Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation
- Authors: Bin Ren, Hao Tang, Nicu Sebe
- Abstract summary: It is hard to generate an image at target view well for previous cross-view image translation methods.
We propose a novel two-stage framework with a new Cascaded Cross-Mixer (CrossMLP) sub-network.
In the first stage, the CrossMLP sub-network learns the latent transformation cues between image code and semantic map code.
In the second stage, we design a refined pixel-level loss that eases the noisy semantic label problem.
- Score: 70.00392682183515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is hard to generate an image at target view well for previous cross-view
image translation methods that directly adopt a simple encoder-decoder or U-Net
structure, especially for drastically different views and severe deformation
cases. To ease this problem, we propose a novel two-stage framework with a new
Cascaded Cross MLP-Mixer (CrossMLP) sub-network in the first stage and one
refined pixel-level loss in the second stage. In the first stage, the CrossMLP
sub-network learns the latent transformation cues between image code and
semantic map code via our novel CrossMLP blocks. Then the coarse results are
generated progressively under the guidance of those cues. Moreover, in the
second stage, we design a refined pixel-level loss that eases the noisy
semantic label problem with more reasonable regularization in a more compact
fashion for better optimization. Extensive experimental results on
Dayton~\cite{vo2016localizing} and CVUSA~\cite{workman2015wide} datasets show
that our method can generate significantly better results than state-of-the-art
methods. The source code and trained models are available at
https://github.com/Amazingren/CrossMLP.
Related papers
- Exploring Multi-view Pixel Contrast for General and Robust Image Forgery Localization [4.8454936010479335]
We propose a Multi-view Pixel-wise Contrastive algorithm (MPC) for image forgery localization.
Specifically, we first pre-train the backbone network with the supervised contrastive loss.
Then the localization head is fine-tuned using the cross-entropy loss, resulting in a better pixel localizer.
arXiv Detail & Related papers (2024-06-19T13:51:52Z) - MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner
for Open-World Semantic Segmentation [110.09800389100599]
We propose MixReorg, a novel and straightforward pre-training paradigm for semantic segmentation.
Our approach involves generating fine-grained patch-text pairs data by mixing image patches while preserving the correspondence between patches and text.
With MixReorg as a mask learner, conventional text-supervised semantic segmentation models can achieve highly generalizable pixel-semantic alignment ability.
arXiv Detail & Related papers (2023-08-09T09:35:16Z) - Improving Pixel-based MIM by Reducing Wasted Modeling Capability [77.99468514275185]
We propose a new method that explicitly utilizes low-level features from shallow layers to aid pixel reconstruction.
To the best of our knowledge, we are the first to systematically investigate multi-level feature fusion for isotropic architectures.
Our method yields significant performance gains, such as 1.2% on fine-tuning, 2.8% on linear probing, and 2.6% on semantic segmentation.
arXiv Detail & Related papers (2023-08-01T03:44:56Z) - MLP-GAN for Brain Vessel Image Segmentation [19.807219907693145]
Brain vessel image segmentation can be used as a promising biomarker for better prevention and treatment of different diseases.
One successful approach is to consider the segmentation as an image-to-image translation task and to learn a conditional Generative Adversarial Network (cGAN) to learn a transformation between two distributions.
We present a novel multi-view approach, which perform a 3D volumetric brain vessel image into three different 2D images (i.e., sagittal, coronal, axial) and then feed them into three different 2D cGANs.
Our model obtains the ability to capture cross-patch information
arXiv Detail & Related papers (2022-07-17T19:24:38Z) - PI-Trans: Parallel-ConvMLP and Implicit-Transformation Based GAN for
Cross-View Image Translation [84.97160975101718]
We propose a novel generative adversarial network, PI-Trans, which consists of a novel Parallel-ConvMLP module and an Implicit Transformation module at multiple semantic levels.
PI-Trans achieves the best qualitative and quantitative performance by a large margin compared to the state-of-the-art methods on two challenging datasets.
arXiv Detail & Related papers (2022-07-09T10:35:44Z) - VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix [59.25846149124199]
This paper proposes a data augmentation method, namely cross-modal CutMix.
CMC transforms natural sentences from the textual view into a multi-modal view.
By attaching cross-modal noise on uni-modal data, it guides models to learn token-level interactions across modalities for better denoising.
arXiv Detail & Related papers (2022-06-17T17:56:47Z) - Global and Local Alignment Networks for Unpaired Image-to-Image
Translation [170.08142745705575]
The goal of unpaired image-to-image translation is to produce an output image reflecting the target domain's style.
Due to the lack of attention to the content change in existing methods, semantic information from source images suffers from degradation during translation.
We introduce a novel approach, Global and Local Alignment Networks (GLA-Net)
Our method effectively generates sharper and more realistic images than existing approaches.
arXiv Detail & Related papers (2021-11-19T18:01:54Z) - MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image
Translation [0.0]
We propose a new unpaired image-to-image translation model called MixerGAN.
We show that MixerGAN achieves competitive results when compared to prior convolutional-based methods.
arXiv Detail & Related papers (2021-05-28T21:12:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.