Bi-level Feature Alignment for Versatile Image Translation and
Manipulation
- URL: http://arxiv.org/abs/2107.03021v1
- Date: Wed, 7 Jul 2021 05:26:29 GMT
- Title: Bi-level Feature Alignment for Versatile Image Translation and
Manipulation
- Authors: Fangneng Zhan, Yingchen Yu, Rongliang Wu, Kaiwen Cui, Aoran Xiao,
Shijian Lu, Ling Shao
- Abstract summary: Generative adversarial networks (GANs) have achieved great success in image translation and manipulation.
High-fidelity image generation with faithful style control remains a grand challenge in computer vision.
This paper presents a versatile image translation and manipulation framework that achieves accurate semantic and style guidance.
- Score: 88.5915443957795
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative adversarial networks (GANs) have achieved great success in image
translation and manipulation. However, high-fidelity image generation with
faithful style control remains a grand challenge in computer vision. This paper
presents a versatile image translation and manipulation framework that achieves
accurate semantic and style guidance in image generation by explicitly building
a correspondence. To handle the quadratic complexity incurred by building the
dense correspondences, we introduce a bi-level feature alignment strategy that
adopts a top-$k$ operation to rank block-wise features followed by dense
attention between block features which reduces memory cost substantially. As
the top-$k$ operation involves index swapping which precludes the gradient
propagation, we propose to approximate the non-differentiable top-$k$ operation
with a regularized earth mover's problem so that its gradient can be
effectively back-propagated. In addition, we design a novel semantic position
encoding mechanism that builds up coordinate for each individual semantic
region to preserve texture structures while building correspondences. Further,
we design a novel confidence feature injection module which mitigates mismatch
problem by fusing features adaptively according to the reliability of built
correspondences. Extensive experiments show that our method achieves superior
performance qualitatively and quantitatively as compared with the
state-of-the-art. The code is available at
\href{https://github.com/fnzhan/RABIT}{https://github.com/fnzhan/RABIT}.
Related papers
- A Spitting Image: Modular Superpixel Tokenization in Vision Transformers [0.0]
Vision Transformer (ViT) architectures traditionally employ a grid-based approach to tokenization independent of the semantic content of an image.
We propose a modular superpixel tokenization strategy which decouples tokenization and feature extraction.
arXiv Detail & Related papers (2024-08-14T17:28:58Z) - Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection [41.35861722481721]
Deepfake threats to society and cybersecurity have provoked significant public apprehension.
This paper introduces an elegantly simple yet effective strategy named Thumbnail Layout (TALL)
TALL transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies.
arXiv Detail & Related papers (2024-03-15T12:48:44Z) - Unsupervised Structure-Consistent Image-to-Image Translation [6.282068591820945]
The Swapping Autoencoder achieved state-of-the-art performance in deep image manipulation and image-to-image translation.
We improve this work by introducing a simple yet effective auxiliary module based on gradient reversal layers.
The auxiliary module's loss forces the generator to learn to reconstruct an image with an all-zero texture code.
arXiv Detail & Related papers (2022-08-24T13:47:15Z) - Marginal Contrastive Correspondence for Guided Image Generation [58.0605433671196]
Exemplar-based image translation establishes dense correspondences between a conditional input and an exemplar from two different domains.
Existing work builds the cross-domain correspondences implicitly by minimizing feature-wise distances across the two domains.
We design a Marginal Contrastive Learning Network (MCL-Net) that explores contrastive learning to learn domain-invariant features for realistic exemplar-based image translation.
arXiv Detail & Related papers (2022-04-01T13:55:44Z) - DenseGAP: Graph-Structured Dense Correspondence Learning with Anchor
Points [15.953570826460869]
Establishing dense correspondence between two images is a fundamental computer vision problem.
We introduce DenseGAP, a new solution for efficient Dense correspondence learning with a Graph-structured neural network conditioned on Anchor Points.
Our method advances the state-of-the-art of correspondence learning on most benchmarks.
arXiv Detail & Related papers (2021-12-13T18:59:30Z) - Semantic Layout Manipulation with High-Resolution Sparse Attention [106.59650698907953]
We tackle the problem of semantic image layout manipulation, which aims to manipulate an input image by editing its semantic label map.
A core problem of this task is how to transfer visual details from the input images to the new semantic layout while making the resulting image visually realistic.
We propose a high-resolution sparse attention module that effectively transfers visual details to new layouts at a resolution up to 512x512.
arXiv Detail & Related papers (2020-12-14T06:50:43Z) - GOCor: Bringing Globally Optimized Correspondence Volumes into Your
Neural Network [176.3781969089004]
Feature correlation layer serves as a key neural network module in computer vision problems that involve dense correspondences between image pairs.
We propose GOCor, a fully differentiable dense matching module, acting as a direct replacement to the feature correlation layer.
Our approach significantly outperforms the feature correlation layer for the tasks of geometric matching, optical flow, and dense semantic matching.
arXiv Detail & Related papers (2020-09-16T17:33:01Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.