RefineStyle: Dynamic Convolution Refinement for StyleGAN
- URL: http://arxiv.org/abs/2410.06104v1
- Date: Tue, 8 Oct 2024 15:01:30 GMT
- Title: RefineStyle: Dynamic Convolution Refinement for StyleGAN
- Authors: Siwei Xia, Xueqi Hu, Li Sun, Qingli Li,
- Abstract summary: In StyleGAN, convolution kernels are shaped by both static parameters shared across images.
$mathcalW+$ space is often used for image inversion and editing.
This paper proposes an efficient refining strategy for dynamic kernels.
- Score: 15.230430037135017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In StyleGAN, convolution kernels are shaped by both static parameters shared across images and dynamic modulation factors $w^+\in\mathcal{W}^+$ specific to each image. Therefore, $\mathcal{W}^+$ space is often used for image inversion and editing. However, pre-trained model struggles with synthesizing out-of-domain images due to the limited capabilities of $\mathcal{W}^+$ and its resultant kernels, necessitating full fine-tuning or adaptation through a complex hypernetwork. This paper proposes an efficient refining strategy for dynamic kernels. The key idea is to modify kernels by low-rank residuals, learned from input image or domain guidance. These residuals are generated by matrix multiplication between two sets of tokens with the same number, which controls the complexity. We validate the refining scheme in image inversion and domain adaptation. In the former task, we design grouped transformer blocks to learn these token sets by one- or two-stage training. In the latter task, token sets are directly optimized to support synthesis in the target domain while preserving original content. Extensive experiments show that our method achieves low distortions for image inversion and high quality for out-of-domain editing.
Related papers
- Guided Patch-Grouping Wavelet Transformer with Spatial Congruence for
Ultra-High Resolution Segmentation [18.50799240622156]
Proposed Guided Patch-Grouping Wavelet Transformer (GPWFormer)
$mathcalT$ takes the whole UHR image as input and harvests both local details and fine-grained long-range contextual dependencies.
$mathcalC$ takes downsampled image as input for learning the category-wise deep context.
arXiv Detail & Related papers (2023-07-03T02:19:48Z) - WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting [2.3014300466616078]
This paper diverges from vision transformers by using a computationally-efficient WaveMix-based fully convolutional architecture -- WavePaint.
It uses a 2D-discrete wavelet transform (DWT) for spatial and multi-resolution token-mixing along with convolutional layers.
Our model even outperforms current GAN-based architectures in CelebA-HQ dataset without using an adversarially trainable discriminator.
arXiv Detail & Related papers (2023-07-01T18:41:34Z) - Vision Transformers with Mixed-Resolution Tokenization [34.18534105043819]
Vision Transformer models process input images by dividing them into a spatially regular grid of equal-size patches.
We introduce a novel image tokenization scheme, replacing the standard uniform grid with a mixed-resolution sequence of tokens.
Using the Quadtree algorithm and a novel saliency scorer, we construct a patch mosaic where low-saliency areas of the image are processed in low resolution.
arXiv Detail & Related papers (2023-04-01T10:39:46Z) - Raising The Limit Of Image Rescaling Using Auxiliary Encoding [7.9700865143145485]
Recently, image rescaling models like IRN utilize the bidirectional nature of INN to push the performance limit of image upscaling.
We propose auxiliary encoding modules to further push the limit of image rescaling performance.
arXiv Detail & Related papers (2023-03-12T20:49:07Z) - Gradient Adjusting Networks for Domain Inversion [82.72289618025084]
StyleGAN2 was demonstrated to be a powerful image generation engine that supports semantic editing.
We present a per-image optimization method that tunes a StyleGAN2 generator such that it achieves a local edit to the generator's weights.
Our experiments show a sizable gap in performance over the current state of the art in this very active domain.
arXiv Detail & Related papers (2023-02-22T14:47:57Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency.
The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on.
Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z) - Hybrid Model-based / Data-driven Graph Transform for Image Coding [54.31406300524195]
We present a hybrid model-based / data-driven approach to encode an intra-prediction residual block.
The first $K$ eigenvectors of a transform matrix are derived from a statistical model, e.g., the asymmetric discrete sine transform (ADST) for stability.
Using WebP as a baseline image, experimental results show that our hybrid graph transform achieved better energy compaction than default discrete cosine transform (DCT) and better stability than KLT.
arXiv Detail & Related papers (2022-03-02T15:36:44Z) - Enhancing variational generation through self-decomposition [0.0]
We introduce the notion of Split Variational Autoencoder (SVAE)
The network is trained as a usual Variational Autoencoder with a negative loglikelihood loss between training and reconstructed images.
According to the FID metric, our technique, tested on typical datasets such as Mnist, Cifar10 and Celeba, allows us to outperform all previous purely variational architectures.
arXiv Detail & Related papers (2022-02-06T08:49:21Z) - Fast and High-Quality Image Denoising via Malleable Convolutions [72.18723834537494]
We present Malleable Convolution (MalleConv), as an efficient variant of dynamic convolution.
Unlike previous works, MalleConv generates a much smaller set of spatially-varying kernels from input.
We also build an efficient denoising network using MalleConv, coined as MalleNet.
arXiv Detail & Related papers (2022-01-02T18:35:20Z) - NP-DRAW: A Non-Parametric Structured Latent Variable Modelfor Image
Generation [139.8037697822064]
We present a non-parametric structured latent variable model for image generation, called NP-DRAW.
It sequentially draws on a latent canvas in a part-by-part fashion and then decodes the image from the canvas.
arXiv Detail & Related papers (2021-06-25T05:17:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.