Paying U-Attention to Textures: Multi-Stage Hourglass Vision Transformer for Universal Texture Synthesis
- URL: http://arxiv.org/abs/2202.11703v3
- Date: Thu, 8 Aug 2024 03:09:21 GMT
- Title: Paying U-Attention to Textures: Multi-Stage Hourglass Vision Transformer for Universal Texture Synthesis
- Authors: Shouchang Guo, Valentin Deschaintre, Douglas Noll, Arthur Roullier,
- Abstract summary: We present a novel U-Attention vision Transformer for universal texture synthesis.
We exploit the natural long-range dependencies enabled by the attention mechanism to allow our approach to synthesize diverse textures.
We propose a hierarchical hourglass backbone that attends to the global structure and performs patch mapping at varying scales.
- Score: 2.8998926117101367
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel U-Attention vision Transformer for universal texture synthesis. We exploit the natural long-range dependencies enabled by the attention mechanism to allow our approach to synthesize diverse textures while preserving their structures in a single inference. We propose a hierarchical hourglass backbone that attends to the global structure and performs patch mapping at varying scales in a coarse-to-fine-to-coarse stream. Completed by skip connection and convolution designs that propagate and fuse information at different scales, our hierarchical U-Attention architecture unifies attention to features from macro structures to micro details, and progressively refines synthesis results at successive stages. Our method achieves stronger 2$\times$ synthesis than previous work on both stochastic and structured textures while generalizing to unseen textures without fine-tuning. Ablation studies demonstrate the effectiveness of each component of our architecture.
Related papers
- Generating Non-Stationary Textures using Self-Rectification [70.91414475376698]
This paper addresses the challenge of example-based non-stationary texture synthesis.
We introduce a novel twostep approach wherein users first modify a reference texture using standard image editing tools.
Our proposed method, termed "self-rectification", automatically refines this target into a coherent, seamless texture.
arXiv Detail & Related papers (2024-01-05T15:07:05Z) - Consistent123: Improve Consistency for One Image to 3D Object Synthesis [74.1094516222327]
Large image diffusion models enable novel view synthesis with high quality and excellent zero-shot capability.
These models have no guarantee of view consistency, limiting the performance for downstream tasks like 3D reconstruction and image-to-3D generation.
We propose Consistent123 to synthesize novel views simultaneously by incorporating additional cross-view attention layers and the shared self-attention mechanism.
arXiv Detail & Related papers (2023-10-12T07:38:28Z) - Pyramid Texture Filtering [86.15126028139736]
We present a simple but effective technique to smooth out textures while preserving the prominent structures.
Our method is built upon a key observation -- the coarsest level in a Gaussian pyramid often naturally eliminates textures and summarizes the main image structures.
We show that our approach is effective to separate structure from texture of different scales, local contrasts, and forms, without degrading structures or introducing visual artifacts.
arXiv Detail & Related papers (2023-05-11T02:05:30Z) - A geometrically aware auto-encoder for multi-texture synthesis [1.2891210250935146]
We propose an auto-encoder architecture for multi-texture synthesis.
Images are embedded in a compact and geometrically consistent latent space.
Texture synthesis and tasks can be performed directly from these latent codes.
arXiv Detail & Related papers (2023-02-03T09:28:39Z) - Towards Universal Texture Synthesis by Combining Texton Broadcasting
with Noise Injection in StyleGAN-2 [11.67779950826776]
We present a new approach for universal texture synthesis incorporating by a multi-scale texton broadcasting module in the StyleGAN-2 framework.
The texton broadcasting module introduces an inductive bias, enabling generation of broader range of textures from those with regular structures to completely ones.
arXiv Detail & Related papers (2022-03-08T17:44:35Z) - Texture Reformer: Towards Fast and Universal Interactive Texture
Transfer [16.41438144343516]
texture reformer is a neural-based framework for interactive texture transfer with user-specified guidance.
We introduce a novel learning-free view-specific texture reformation (VSTR) operation with a new semantic map guidance strategy.
The experimental results on a variety of application scenarios demonstrate the effectiveness and superiority of our framework.
arXiv Detail & Related papers (2021-12-06T05:20:43Z) - Dynamic Texture Synthesis by Incorporating Long-range Spatial and
Temporal Correlations [27.247382497265214]
We introduce a new loss term, called the Shifted Gram loss, to capture the structural and long-range correlation of the reference texture video.
We also introduce a frame sampling strategy to exploit long-period motion across multiple frames.
arXiv Detail & Related papers (2021-04-13T05:04:51Z) - Transposer: Universal Texture Synthesis Using Feature Maps as Transposed
Convolution Filter [43.9258342767253]
We propose a novel way of using transposed convolution operation for texture synthesis.
Our framework achieves state-of-the-art texture synthesis quality based on various metrics.
arXiv Detail & Related papers (2020-07-14T17:57:59Z) - Region-adaptive Texture Enhancement for Detailed Person Image Synthesis [86.69934638569815]
RATE-Net is a novel framework for synthesizing person images with sharp texture details.
The proposed framework leverages an additional texture enhancing module to extract appearance information from the source image.
Experiments conducted on DeepFashion benchmark dataset have demonstrated the superiority of our framework compared with existing networks.
arXiv Detail & Related papers (2020-05-26T02:33:21Z) - Towards Analysis-friendly Face Representation with Scalable Feature and
Texture Compression [113.30411004622508]
We show that a universal and collaborative visual information representation can be achieved in a hierarchical way.
Based on the strong generative capability of deep neural networks, the gap between the base feature layer and enhancement layer is further filled with the feature level texture reconstruction.
To improve the efficiency of the proposed framework, the base layer neural network is trained in a multi-task manner.
arXiv Detail & Related papers (2020-04-21T14:32:49Z) - Hierarchy Composition GAN for High-fidelity Image Synthesis [57.32311953820988]
This paper presents an innovative Hierarchical Composition GAN (HIC-GAN)
HIC-GAN incorporates image synthesis in geometry and appearance domains into an end-to-end trainable network.
Experiments on scene text image synthesis, portrait editing and indoor rendering tasks show that the proposed HIC-GAN achieves superior synthesis performance qualitatively and quantitatively.
arXiv Detail & Related papers (2019-05-12T11:11:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.