Texture Image Synthesis Using Spatial GAN Based on Vision Transformers
- URL: http://arxiv.org/abs/2502.01842v1
- Date: Mon, 03 Feb 2025 21:39:30 GMT
- Title: Texture Image Synthesis Using Spatial GAN Based on Vision Transformers
- Authors: Elahe Salari, Zohreh Azimifar,
- Abstract summary: We propose ViT-SGAN, a new hybrid model that fuses Vision Transformers (ViTs) with a Spatial Generative Adversarial Network (SGAN) to address the limitations of previous methods.
By incorporating specialized texture descriptors such as mean-variance (mu, sigma) and textons into the self-attention mechanism of ViTs, our model achieves superior texture synthesis.
- Score: 1.6482333106552793
- License:
- Abstract: Texture synthesis is a fundamental task in computer vision, whose goal is to generate visually realistic and structurally coherent textures for a wide range of applications, from graphics to scientific simulations. While traditional methods like tiling and patch-based techniques often struggle with complex textures, recent advancements in deep learning have transformed this field. In this paper, we propose ViT-SGAN, a new hybrid model that fuses Vision Transformers (ViTs) with a Spatial Generative Adversarial Network (SGAN) to address the limitations of previous methods. By incorporating specialized texture descriptors such as mean-variance (mu, sigma) and textons into the self-attention mechanism of ViTs, our model achieves superior texture synthesis. This approach enhances the model's capacity to capture complex spatial dependencies, leading to improved texture quality that is superior to state-of-the-art models, especially for regular and irregular textures. Comparison experiments with metrics such as FID, IS, SSIM, and LPIPS demonstrate the substantial improvement of ViT-SGAN, which underlines its efficiency in generating diverse realistic textures.
Related papers
- DTSGAN: Learning Dynamic Textures via Spatiotemporal Generative Adversarial Network [11.511407106519245]
We introduce atemporal generative adversarial video network (DTSGAN) that can learn from a single dynamic texture.
With the pipeline of DTSGAN, a new video sequence is generated from a coarsest scale to the finest one.
arXiv Detail & Related papers (2024-12-22T09:49:48Z) - NeRF-Texture: Synthesizing Neural Radiance Field Textures [77.24205024987414]
We propose a novel texture synthesis method with Neural Radiance Fields (NeRF) to capture and synthesize textures from given multi-view images.
In the proposed NeRF texture representation, a scene with fine geometric details is disentangled into the meso-structure textures and the underlying base shape.
We can synthesize NeRF-based textures through patch matching of latent features.
arXiv Detail & Related papers (2024-12-13T09:41:48Z) - A Comparative Survey of Vision Transformers for Feature Extraction in Texture Analysis [9.687982148528187]
Convolutional Neural Networks (CNNs) are currently among the best texture analysis approaches.
Vision Transformers (ViTs) have been surpassing the performance of CNNs on tasks such as object recognition.
This work explores various pre-trained ViT architectures when transferred to tasks that rely on textures.
arXiv Detail & Related papers (2024-06-10T09:48:13Z) - Generating Non-Stationary Textures using Self-Rectification [70.91414475376698]
This paper addresses the challenge of example-based non-stationary texture synthesis.
We introduce a novel twostep approach wherein users first modify a reference texture using standard image editing tools.
Our proposed method, termed "self-rectification", automatically refines this target into a coherent, seamless texture.
arXiv Detail & Related papers (2024-01-05T15:07:05Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Diffusion-based Holistic Texture Rectification and Synthesis [26.144666226217062]
Traditional texture synthesis approaches focus on generating textures from pristine samples.
We propose a framework that synthesizes holistic textures from degraded samples in natural images.
arXiv Detail & Related papers (2023-09-26T08:44:46Z) - Texture Representation via Analysis and Synthesis with Generative
Adversarial Networks [11.67779950826776]
We investigate data-driven texture modeling via analysis and synthesis with generative synthesis.
We adopt StyleGAN3 for synthesis and demonstrate that it produces diverse textures beyond those represented in the training data.
For texture analysis, we propose GAN using a novel latent consistency criterion for synthesized textures, and iterative refinement with Gramian loss for real textures.
arXiv Detail & Related papers (2022-12-20T03:57:11Z) - Modeling Image Composition for Complex Scene Generation [77.10533862854706]
We present a method that achieves state-of-the-art results on layout-to-image generation tasks.
After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch.
arXiv Detail & Related papers (2022-06-02T08:34:25Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z) - Intriguing Properties of Vision Transformers [114.28522466830374]
Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems.
We systematically study this question via an extensive set of experiments and comparisons with a high-performing convolutional neural network (CNN)
We show effective features of ViTs are due to flexible receptive and dynamic fields possible via the self-attention mechanism.
arXiv Detail & Related papers (2021-05-21T17:59:18Z) - Dynamic Texture Synthesis by Incorporating Long-range Spatial and
Temporal Correlations [27.247382497265214]
We introduce a new loss term, called the Shifted Gram loss, to capture the structural and long-range correlation of the reference texture video.
We also introduce a frame sampling strategy to exploit long-period motion across multiple frames.
arXiv Detail & Related papers (2021-04-13T05:04:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.