Dynamic Texture Transfer using PatchMatch and Transformers
- URL: http://arxiv.org/abs/2402.00606v1
- Date: Thu, 1 Feb 2024 13:58:32 GMT
- Title: Dynamic Texture Transfer using PatchMatch and Transformers
- Authors: Guo Pu, Shiyao Xu, Xixin Cao, Zhouhui Lian
- Abstract summary: We propose to handle the task of dynamic texture transfer via a simple yet effective model that utilizes both PatchMatch and Transformers.
The key idea is to decompose the task of dynamic texture transfer into two stages, where the start frame of the target video with the desired dynamic texture is synthesized.
In the second stage, the synthesized image is decomposed into structure-agnostic patches, according to which their corresponding subsequent patches can be predicted.
- Score: 18.54386654063111
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How to automatically transfer the dynamic texture of a given video to the
target still image is a challenging and ongoing problem. In this paper, we
propose to handle this task via a simple yet effective model that utilizes both
PatchMatch and Transformers. The key idea is to decompose the task of dynamic
texture transfer into two stages, where the start frame of the target video
with the desired dynamic texture is synthesized in the first stage via a
distance map guided texture transfer module based on the PatchMatch algorithm.
Then, in the second stage, the synthesized image is decomposed into
structure-agnostic patches, according to which their corresponding subsequent
patches can be predicted by exploiting the powerful capability of Transformers
equipped with VQ-VAE for processing long discrete sequences. After getting all
those patches, we apply a Gaussian weighted average merging strategy to
smoothly assemble them into each frame of the target stylized video.
Experimental results demonstrate the effectiveness and superiority of the
proposed method in dynamic texture transfer compared to the state of the art.
Related papers
- Patch Is Not All You Need [57.290256181083016]
We propose a novel Pattern Transformer to adaptively convert images to pattern sequences for Transformer input.
We employ the Convolutional Neural Network to extract various patterns from the input image.
We have accomplished state-of-the-art performance on CIFAR-10 and CIFAR-100, and have achieved competitive results on ImageNet.
arXiv Detail & Related papers (2023-08-21T13:54:00Z) - Dual-path Adaptation from Image to Video Transformers [62.056751480114784]
We efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters.
We propose a novel DualPath adaptation separated into spatial and temporal adaptation paths, where a lightweight bottleneck adapter is employed in each transformer block.
arXiv Detail & Related papers (2023-03-17T09:37:07Z) - TTVFI: Learning Trajectory-Aware Transformer for Video Frame
Interpolation [50.49396123016185]
Video frame (VFI) aims to synthesize an intermediate frame between two consecutive frames.
We propose a novel Trajectory-aware Transformer for Video Frame Interpolation (TTVFI)
Our method outperforms other state-of-the-art methods in four widely-used VFI benchmarks.
arXiv Detail & Related papers (2022-07-19T03:37:49Z) - Modeling Image Composition for Complex Scene Generation [77.10533862854706]
We present a method that achieves state-of-the-art results on layout-to-image generation tasks.
After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch.
arXiv Detail & Related papers (2022-06-02T08:34:25Z) - FuseFormer: Fusing Fine-Grained Information in Transformers for Video
Inpainting [77.8621673355983]
We propose FuseFormer, a Transformer model designed for video inpainting via fine-grained feature fusion.
We elaborately insert the soft composition and soft split into the feed-forward network, enabling the 1D linear layers to have the capability of modelling 2D structure.
In both quantitative and qualitative evaluations, our proposed FuseFormer surpasses state-of-the-art methods.
arXiv Detail & Related papers (2021-09-07T10:13:29Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z) - SieveNet: A Unified Framework for Robust Image-Based Virtual Try-On [14.198545992098309]
SieveNet is a framework for robust image-based virtual try-on.
We introduce a multi-stage coarse-to-fine warping network to better model fine-grained intricacies.
We also introduce a try-on cloth conditioned segmentation mask prior to improve the texture transfer network.
arXiv Detail & Related papers (2020-01-17T12:33:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.