LinkTo-Anime: A 2D Animation Optical Flow Dataset from 3D Model Rendering
- URL: http://arxiv.org/abs/2506.02733v2
- Date: Tue, 29 Jul 2025 04:07:34 GMT
- Title: LinkTo-Anime: A 2D Animation Optical Flow Dataset from 3D Model Rendering
- Authors: Xiaoyi Feng, Kaifeng Zou, Caichun Cen, Tao Huang, Hui Guo, Zizhou Huang, Yingli Zhao, Mingqing Zhang, Ziyuan Zheng, Diwei Wang, Yuntao Zou, Dagang Li,
- Abstract summary: LinkTo-Anime is the first high-quality dataset specifically designed for cel anime character motion generated with 3D model rendering.<n>The dataset comprises 395 video sequences, totally 24,230 training frames, 720 validation frames, and 4,320 test frames.
- Score: 7.938152002786597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing optical flow datasets focus primarily on real-world simulation or synthetic human motion, but few are tailored to Celluloid(cel) anime character motion: a domain with unique visual and motion characteristics. To bridge this gap and facilitate research in optical flow estimation and downstream tasks such as anime video generation and line drawing colorization, we introduce LinkTo-Anime, the first high-quality dataset specifically designed for cel anime character motion generated with 3D model rendering. LinkTo-Anime provides rich annotations including forward and backward optical flow, occlusion masks, and Mixamo Skeleton. The dataset comprises 395 video sequences, totally 24,230 training frames, 720 validation frames, and 4,320 test frames. Furthermore, a comprehensive benchmark is constructed with various optical flow estimation methods to analyze the shortcomings and limitations across multiple datasets.
Related papers
- Aligning Anime Video Generation with Human Feedback [31.701968335565393]
Anime video generation faces significant challenges due to the scarcity of anime data and unusual motion patterns.<n>Existing reward models, designed primarily for real-world videos, fail to capture the unique appearance and consistency requirements of anime.<n>We propose a pipeline to enhance anime video generation by leveraging human feedback for better alignment.
arXiv Detail & Related papers (2025-04-14T09:49:34Z) - Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models [71.78723353724493]
Animation of humanoid characters is essential in various graphics applications.<n>We propose an approach to synthesize 4D animated sequences of input static 3D humanoid meshes.
arXiv Detail & Related papers (2025-03-20T10:00:22Z) - Animate3D: Animating Any 3D Model with Multi-view Video Diffusion [47.05131487114018]
Animate3D is a novel framework for animating any static 3D model.
We introduce a framework combining reconstruction and 4D Score Distillation Sampling (4D-SDS) to leverage the multi-view video diffusion priors for animating 3D objects.
arXiv Detail & Related papers (2024-07-16T05:35:57Z) - MotionDreamer: Exploring Semantic Video Diffusion features for Zero-Shot 3D Mesh Animation [10.263762787854862]
We propose a technique for automatic re-animation of various 3D shapes based on a motion prior extracted from a video diffusion model.
We leverage an explicit mesh-based representation compatible with existing computer-graphics pipelines.
Our time-efficient zero-shot method achieves a superior performance re-animating a diverse set of 3D shapes.
arXiv Detail & Related papers (2024-05-30T15:30:38Z) - Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation [27.700371215886683]
diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities.
In this paper, we propose a novel framework tailored for character animation.
By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods.
arXiv Detail & Related papers (2023-11-28T12:27:15Z) - Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic
Scenes [69.52540205439989]
We introduce Im4D, a hybrid representation that consists of a grid-based geometry representation and a multi-view image-based appearance representation.
We represent the scene appearance by the original multi-view videos and a network that learns to predict the color of a 3D point from image features.
We show that Im4D state-of-the-art performance in rendering quality and can be trained efficiently, while realizing real-time rendering with a speed of 79.8 FPS for 512x512 images.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point
Tracking [90.29143475328506]
We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework.
Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion.
We animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos.
arXiv Detail & Related papers (2023-07-27T17:58:11Z) - Learning 3D Photography Videos via Self-supervised Diffusion on Single
Images [105.81348348510551]
3D photography renders a static image into a video with appealing 3D visual effects.
Existing approaches typically first conduct monocular depth estimation, then render the input frame to subsequent frames with various viewpoints.
We present a novel task: out-animation, which extends the space and time of input objects.
arXiv Detail & Related papers (2023-02-21T16:18:40Z) - AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies [98.65469430034246]
Existing datasets for two-dimensional (2D) cartoon suffer from simple frame composition and monotonic movements.
We present a new 2D animation visual correspondence dataset, AnimeRun, by converting open source 3D movies to full scenes in 2D style.
Our analyses show that the proposed dataset not only resembles real anime more in image composition, but also possesses richer and more complex motion patterns compared to existing datasets.
arXiv Detail & Related papers (2022-11-10T17:26:21Z) - Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for
Temporal Sentence Grounding [61.57847727651068]
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.
Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information.
We propose a novel Motion- and Appearance-guided 3D Semantic Reasoning Network (MA3SRN), which incorporates optical-flow-guided motion-aware, detection-based appearance-aware, and 3D-aware object-level features.
arXiv Detail & Related papers (2022-03-06T13:57:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.