Cartoondiff: Training-free Cartoon Image Generation with Diffusion
Transformer Models
- URL: http://arxiv.org/abs/2309.08251v1
- Date: Fri, 15 Sep 2023 08:55:59 GMT
- Title: Cartoondiff: Training-free Cartoon Image Generation with Diffusion
Transformer Models
- Authors: Feihong He, Gang Li, Lingyu Si, Leilei Yan, Shimeng Hou, Hongwei Dong,
Fanzhang Li
- Abstract summary: We present CartoonDiff, a novel training-free sampling approach which generates image cartoonization using diffusion transformer models.
We implement the image cartoonization process by normalizing high-frequency signal of the noisy image in specific denoising steps.
- Score: 5.830731563895666
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image cartoonization has attracted significant interest in the field of image
generation. However, most of the existing image cartoonization techniques
require re-training models using images of cartoon style. In this paper, we
present CartoonDiff, a novel training-free sampling approach which generates
image cartoonization using diffusion transformer models. Specifically, we
decompose the reverse process of diffusion models into the semantic generation
phase and the detail generation phase. Furthermore, we implement the image
cartoonization process by normalizing high-frequency signal of the noisy image
in specific denoising steps. CartoonDiff doesn't require any additional
reference images, complex model designs, or the tedious adjustment of multiple
parameters. Extensive experimental results show the powerful ability of our
CartoonDiff. The project page is available at: https://cartoondiff.github.io/
Related papers
- DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation [46.5013105017258]
Diffusion models are trained by denoising a Markovian process that gradually adds noise to the input.
We propose DART, a transformer-based model that unifies autoregressive (AR) and diffusion within a non-Markovian framework.
arXiv Detail & Related papers (2024-10-10T17:41:54Z) - Style-NeRF2NeRF: 3D Style Transfer From Style-Aligned Multi-View Images [54.56070204172398]
We propose a simple yet effective pipeline for stylizing a 3D scene.
We perform 3D style transfer by refining the source NeRF model using stylized images generated by a style-aligned image-to-image diffusion model.
We demonstrate that our method can transfer diverse artistic styles to real-world 3D scenes with competitive quality.
arXiv Detail & Related papers (2024-06-19T09:36:18Z) - UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation [53.16986875759286]
We present a UniAnimate framework to enable efficient and long-term human video generation.
We map the reference image along with the posture guidance and noise video into a common feature space.
We also propose a unified noise input that supports random noised input as well as first frame conditioned input.
arXiv Detail & Related papers (2024-06-03T10:51:10Z) - FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - Pix2Gif: Motion-Guided Diffusion for GIF Generation [70.64240654310754]
We present Pix2Gif, a motion-guided diffusion model for image-to-GIF (video) generation.
We propose a new motion-guided warping module to spatially transform the features of the source image conditioned on the two types of prompts.
In preparation for the model training, we meticulously curated data by extracting coherent image frames from the TGIF video-caption dataset.
arXiv Detail & Related papers (2024-03-07T16:18:28Z) - AnimateZero: Video Diffusion Models are Zero-Shot Image Animators [63.938509879469024]
We propose AnimateZero to unveil the pre-trained text-to-video diffusion model, i.e., AnimateDiff.
For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation.
For temporal control, we replace the global temporal attention of the original T2V model with our proposed positional-corrected window attention.
arXiv Detail & Related papers (2023-12-06T13:39:35Z) - Instance-guided Cartoon Editing with a Large-scale Dataset [12.955181769243232]
We present an instance-aware image segmentation model that can generate accurate, high-resolution segmentation masks for characters in cartoon images.
We present that the proposed approach enables a range of segmentation-dependent cartoon editing applications like 3D Ken Burns parallax effects, text-guided cartoon style editing, and puppet animation from illustrations and manga.
arXiv Detail & Related papers (2023-12-04T15:00:15Z) - A Method for Training-free Person Image Picture Generation [4.043367784553845]
A Character Image Feature model is proposed in this paper.
It enables the user to use the process by simply providing a picture of the character to make the image of the character in the generated image match the expectation.
The proposed model can be conveniently incorporated into the Stable Diffusion generation process without modifying the model's or used in combination with Stable Diffusion as a joint model.
arXiv Detail & Related papers (2023-05-16T21:46:28Z) - Null-text Guidance in Diffusion Models is Secretly a Cartoon-style
Creator [20.329795810937206]
null-text guidance in diffusion models is secretly a cartoon-style creator, i.e., the generated images can be efficiently transformed into cartoons by simply perturbing the null-text guidance.
We propose two disturbance methods, i.e., Rollback disturbance (Back-D) and Image disturbance (Image-D), to construct misalignment between the noisy images used for predicting null-text guidance and text guidance.
Back-D achieves cartoonization by altering the noise level of null-text noisy image via replacing $x_t$ with $x_t+Delta t$
arXiv Detail & Related papers (2023-05-11T10:36:52Z) - NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real
Image Animation [66.0838349951456]
Nerf-based Generative models have shown impressive capacity in generating high-quality images with consistent 3D geometry.
We propose a universal method to surgically fine-tune these NeRF-GAN models in order to achieve high-fidelity animation of real subjects only by a single image.
arXiv Detail & Related papers (2022-11-30T18:36:45Z) - Learning to Incorporate Texture Saliency Adaptive Attention to Image
Cartoonization [20.578335938736384]
A novel cartoon-texture-saliency-sampler (CTSS) module is proposed to dynamically sample cartoon-texture-salient patches from training data.
With extensive experiments, we demonstrate that texture saliency adaptive attention in adversarial learning, is of significant importance in facilitating and enhancing image cartoonization.
arXiv Detail & Related papers (2022-08-02T16:45:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.