TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual
Vision Transformer for Fast Arbitrary One-Shot Image Generation
- URL: http://arxiv.org/abs/2302.08047v1
- Date: Thu, 16 Feb 2023 03:05:59 GMT
- Title: TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual
Vision Transformer for Fast Arbitrary One-Shot Image Generation
- Authors: Yunliang Jiang, Lili Yan, Xiongtao Zhang, Yong Liu, Danfeng Sun
- Abstract summary: One-shot image generation (OSG) with generative adversarial networks that learn from the internal patches of a given image has attracted world wide attention.
We propose a novel structure-preserved method TcGAN with individual vision transformer to overcome the shortcomings of the existing one-shot image generation methods.
- Score: 11.207512995742999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One-shot image generation (OSG) with generative adversarial networks that
learn from the internal patches of a given image has attracted world wide
attention. In recent studies, scholars have primarily focused on extracting
features of images from probabilistically distributed inputs with pure
convolutional neural networks (CNNs). However, it is quite difficult for CNNs
with limited receptive domain to extract and maintain the global structural
information. Therefore, in this paper, we propose a novel structure-preserved
method TcGAN with individual vision transformer to overcome the shortcomings of
the existing one-shot image generation methods. Specifically, TcGAN preserves
global structure of an image during training to be compatible with local
details while maintaining the integrity of semantic-aware information by
exploiting the powerful long-range dependencies modeling capability of the
transformer. We also propose a new scaling formula having scale-invariance
during the calculation period, which effectively improves the generated image
quality of the OSG model on image super-resolution tasks. We present the design
of the TcGAN converter framework, comprehensive experimental as well as
ablation studies demonstrating the ability of TcGAN to achieve arbitrary image
generation with the fastest running time. Lastly, TcGAN achieves the most
excellent performance in terms of applying it to other image processing tasks,
e.g., super-resolution as well as image harmonization, the results further
prove its superiority.
Related papers
- Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.
We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - IPT-V2: Efficient Image Processing Transformer using Hierarchical Attentions [26.09373405194564]
We present an efficient image processing transformer architecture with hierarchical attentions, called IPTV2.
We adopt a focal context self-attention (FCSA) and a global grid self-attention (GGSA) to obtain adequate token interactions in local and global receptive fields.
Our proposed IPT-V2 achieves state-of-the-art results on various image processing tasks, covering denoising, deblurring, deraining and obtains much better trade-off for performance and computational complexity than previous methods.
arXiv Detail & Related papers (2024-03-31T10:01:20Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - High-Quality Pluralistic Image Completion via Code Shared VQGAN [51.7805154545948]
We present a novel framework for pluralistic image completion that can achieve both high quality and diversity at much faster inference speed.
Our framework is able to learn semantically-rich discrete codes efficiently and robustly, resulting in much better image reconstruction quality.
arXiv Detail & Related papers (2022-04-05T01:47:35Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Image-specific Convolutional Kernel Modulation for Single Image
Super-resolution [85.09413241502209]
In this issue, we propose a novel image-specific convolutional modulation kernel (IKM)
We exploit the global contextual information of image or feature to generate an attention weight for adaptively modulating the convolutional kernels.
Experiments on single image super-resolution show that the proposed methods achieve superior performances over state-of-the-art methods.
arXiv Detail & Related papers (2021-11-16T11:05:10Z) - LT-GAN: Self-Supervised GAN with Latent Transformation Detection [10.405721171353195]
We propose a self-supervised approach (LT-GAN) to improve the generation quality and diversity of images.
We experimentally demonstrate that our proposed LT-GAN can be effectively combined with other state-of-the-art training techniques for added benefits.
arXiv Detail & Related papers (2020-10-19T22:09:45Z) - Efficient texture-aware multi-GAN for image inpainting [5.33024001730262]
Recent GAN-based (Generative adversarial networks) inpainting methods show remarkable improvements.
We propose a multi-GAN architecture improving both the performance and rendering efficiency.
arXiv Detail & Related papers (2020-09-30T14:58:03Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.