Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models
- URL: http://arxiv.org/abs/2401.16224v1
- Date: Mon, 29 Jan 2024 15:21:37 GMT
- Title: Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models
- Authors: Zhongjie Duan, Chengyu Wang, Cen Chen, Weining Qian, Jun Huang
- Abstract summary: Toon shading is a type of non-photorealistic rendering task of animation.
Diffutoon is capable of rendering remarkably detailed, high-resolution, and extended-duration videos in anime style.
- Score: 25.903156244291168
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Toon shading is a type of non-photorealistic rendering task of animation. Its
primary purpose is to render objects with a flat and stylized appearance. As
diffusion models have ascended to the forefront of image synthesis
methodologies, this paper delves into an innovative form of toon shading based
on diffusion models, aiming to directly render photorealistic videos into anime
styles. In video stylization, extant methods encounter persistent challenges,
notably in maintaining consistency and achieving high visual quality. In this
paper, we model the toon shading problem as four subproblems: stylization,
consistency enhancement, structure guidance, and colorization. To address the
challenges in video stylization, we propose an effective toon shading approach
called \textit{Diffutoon}. Diffutoon is capable of rendering remarkably
detailed, high-resolution, and extended-duration videos in anime style. It can
also edit the content according to prompts via an additional branch. The
efficacy of Diffutoon is evaluated through quantitive metrics and human
evaluation. Notably, Diffutoon surpasses both open-source and closed-source
baseline approaches in our experiments. Our work is accompanied by the release
of both the source code and example videos on Github (Project page:
https://ecnu-cilab.github.io/DiffutoonProjectPage/).
Related papers
- Improving Virtual Try-On with Garment-focused Diffusion Models [91.95830983115474]
Diffusion models have led to the revolutionizing of generative modeling in numerous image synthesis tasks.
We shape a new Diffusion model, namely GarDiff, which triggers the garment-focused diffusion process.
Experiments on VITON-HD and DressCode datasets demonstrate the superiority of our GarDiff when compared to state-of-the-art VTON approaches.
arXiv Detail & Related papers (2024-09-12T17:55:11Z) - WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models [132.77237314239025]
Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos.
Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions.
We reconceptualize video try-on as a process of generating videos conditioned on garment descriptions and human motion.
Our solution, WildVidFit, employs image-based controlled diffusion models for a streamlined, one-stage approach.
arXiv Detail & Related papers (2024-07-15T11:21:03Z) - RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting [63.567363455092234]
RefFusion is a novel 3D inpainting method based on a multi-scale personalization of an image inpainting diffusion model to the given reference view.
Our framework achieves state-of-the-art results for object removal while maintaining high controllability.
arXiv Detail & Related papers (2024-04-16T17:50:02Z) - Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment.
We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images.
We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z) - APISR: Anime Production Inspired Real-World Anime Super-Resolution [15.501488335115269]
We argue that video networks and datasets are not necessary for anime SR due to the repetition use of hand-drawing frames.
Instead, we propose an anime image collection pipeline by choosing the least compressed and the most informative frames from the video sources.
We evaluate our method through extensive experiments on the public benchmark, showing our method outperforms state-of-the-art anime dataset-trained approaches.
arXiv Detail & Related papers (2024-03-03T19:52:43Z) - DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models [66.43179841884098]
We propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models.
Our method achieves various editing modes for the generated or real images, such as object moving, object resizing, object appearance replacement, and content dragging.
arXiv Detail & Related papers (2023-07-05T16:43:56Z) - Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style
Transfer [13.098901971644656]
This paper proposes a zero-shot video stylization method named Style-A-Video.
Uses a generative pre-trained transformer with an image latent diffusion model to achieve a concise text-controlled video stylization.
Tests show that we can attain superior content preservation and stylistic performance while incurring less consumption than previous solutions.
arXiv Detail & Related papers (2023-05-09T14:03:27Z) - AnimeDiffusion: Anime Face Line Drawing Colorization via Diffusion
Models [24.94532405404846]
We propose a novel method called AnimeDiffusion using diffusion models that performs anime face line drawing colorization automatically.
We conduct an anime face line drawing colorization benchmark dataset, which contains 31696 training data and 579 testing data.
We demonstrate AnimeDiffusion outperforms state-of-the-art GANs-based models for anime face drawing colorization.
arXiv Detail & Related papers (2023-03-20T14:15:23Z) - PointAvatar: Deformable Point-based Head Avatars from Videos [103.43941945044294]
PointAvatar is a deformable point-based representation that disentangles the source color into intrinsic albedo and normal-dependent shading.
We show that our method is able to generate animatable 3D avatars using monocular videos from multiple sources.
arXiv Detail & Related papers (2022-12-16T10:05:31Z) - SINE: SINgle Image Editing with Text-to-Image Diffusion Models [10.67527134198167]
This work aims to address the problem of single-image editing.
We propose a novel model-based guidance built upon the classifier-free guidance.
We show promising editing capabilities, including changing style, content addition, and object manipulation.
arXiv Detail & Related papers (2022-12-08T18:57:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.