Sakuga-42M Dataset: Scaling Up Cartoon Research
- URL: http://arxiv.org/abs/2405.07425v1
- Date: Mon, 13 May 2024 01:50:05 GMT
- Title: Sakuga-42M Dataset: Scaling Up Cartoon Research
- Authors: Zhenglin Pan, Yu Zhu, Yuxuan Mu,
- Abstract summary: Sakuga-42M comprises 42 milliontexts covering various artistic styles, regions, and years, with comprehensive semantic annotations.
Our motivation is to introduce large-scaling to cartoon research and foster generalization and robustness in future cartoon applications.
- Score: 4.676528353567339
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Hand-drawn cartoon animation employs sketches and flat-color segments to create the illusion of motion. While recent advancements like CLIP, SVD, and Sora show impressive results in understanding and generating natural video by scaling large models with extensive datasets, they are not as effective for cartoons. Through our empirical experiments, we argue that this ineffectiveness stems from a notable bias in hand-drawn cartoons that diverges from the distribution of natural videos. Can we harness the success of the scaling paradigm to benefit cartoon research? Unfortunately, until now, there has not been a sizable cartoon dataset available for exploration. In this research, we propose the Sakuga-42M Dataset, the first large-scale cartoon animation dataset. Sakuga-42M comprises 42 million keyframes covering various artistic styles, regions, and years, with comprehensive semantic annotations including video-text description pairs, anime tags, content taxonomies, etc. We pioneer the benefits of such a large-scale cartoon dataset on comprehension and generation tasks by finetuning contemporary foundation models like Video CLIP, Video Mamba, and SVD, achieving outstanding performance on cartoon-related tasks. Our motivation is to introduce large-scaling to cartoon research and foster generalization and robustness in future cartoon applications. Dataset, Code, and Pretrained Models will be publicly available.
Related papers
- L4GM: Large 4D Gaussian Reconstruction Model [99.82220378522624]
We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input.
Key to our success is a novel dataset of multiview videos containing curated, rendered animated objects.
arXiv Detail & Related papers (2024-06-14T17:51:18Z) - AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment [64.02822911038848]
We present AnimateZoo, a zero-shot diffusion-based video generator to produce animal animations.
Key technique used in our AnimateZoo is subject alignment, which includes two steps.
Our model is capable of generating videos characterized by accurate movements, consistent appearance, and high-fidelity frames.
arXiv Detail & Related papers (2024-04-07T12:57:41Z) - Instance-guided Cartoon Editing with a Large-scale Dataset [12.955181769243232]
We present an instance-aware image segmentation model that can generate accurate, high-resolution segmentation masks for characters in cartoon images.
We present that the proposed approach enables a range of segmentation-dependent cartoon editing applications like 3D Ken Burns parallax effects, text-guided cartoon style editing, and puppet animation from illustrations and manga.
arXiv Detail & Related papers (2023-12-04T15:00:15Z) - MagicAnimate: Temporally Consistent Human Image Animation using
Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z) - Learning Data-Driven Vector-Quantized Degradation Model for Animation
Video Super-Resolution [59.71387128485845]
We explore the characteristics of animation videos and leverage the rich priors in real-world animation data for a more practical animation VSR model.
We propose a multi-scale Vector-Quantized Degradation model for animation video Super-Resolution (VQD-SR) to decompose the local details from global structures.
A rich-content Real Animation Low-quality (RAL) video dataset is collected for extracting the priors.
arXiv Detail & Related papers (2023-03-17T08:11:14Z) - AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies [98.65469430034246]
Existing datasets for two-dimensional (2D) cartoon suffer from simple frame composition and monotonic movements.
We present a new 2D animation visual correspondence dataset, AnimeRun, by converting open source 3D movies to full scenes in 2D style.
Our analyses show that the proposed dataset not only resembles real anime more in image composition, but also possesses richer and more complex motion patterns compared to existing datasets.
arXiv Detail & Related papers (2022-11-10T17:26:21Z) - AnimeCeleb: Large-Scale Animation CelebFaces Dataset via Controllable 3D
Synthetic Models [19.6347170450874]
We present a large-scale animation celebfaces dataset (AnimeCeleb) via controllable synthetic animation models.
To facilitate the data generation process, we build a semi-automatic pipeline based on an open 3D software.
This leads to constructing a large-scale animation face dataset that includes multi-pose and multi-style animation faces with rich annotations.
arXiv Detail & Related papers (2021-11-15T10:00:06Z) - Deep Animation Video Interpolation in the Wild [115.24454577119432]
In this work, we formally define and study the animation video code problem for the first time.
We propose an effective framework, AnimeInterp, with two dedicated modules in a coarse-to-fine manner.
Notably, AnimeInterp shows favorable perceptual quality and robustness for animation scenarios in the wild.
arXiv Detail & Related papers (2021-04-06T13:26:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.