Related papers: 4-Doodle: Text to 3D Sketches that Move!

4-Doodle: Text to 3D Sketches that Move!

URL: http://arxiv.org/abs/2510.25319v1
Date: Wed, 29 Oct 2025 09:33:29 GMT
Title: 4-Doodle: Text to 3D Sketches that Move!
Authors: Hao Chen, Jiaqi Wang, Yonggang Qi, Ke Li, Kaiyue Pang, Yi-Zhe Song,
Abstract summary: 4-Doodle is the first training-free framework for generating dynamic 3D sketches from text.<n>Our method produces temporally realistic and structurally stable 3D sketch animations, outperforming existing baselines in both fidelity and controllability.
Score: 60.89021458068987
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel task: text-to-3D sketch animation, which aims to bring freeform sketches to life in dynamic 3D space. Unlike prior works focused on photorealistic content generation, we target sparse, stylized, and view-consistent 3D vector sketches, a lightweight and interpretable medium well-suited for visual communication and prototyping. However, this task is very challenging: (i) no paired dataset exists for text and 3D (or 4D) sketches; (ii) sketches require structural abstraction that is difficult to model with conventional 3D representations like NeRFs or point clouds; and (iii) animating such sketches demands temporal coherence and multi-view consistency, which current pipelines do not address. Therefore, we propose 4-Doodle, the first training-free framework for generating dynamic 3D sketches from text. It leverages pretrained image and video diffusion models through a dual-space distillation scheme: one space captures multi-view-consistent geometry using differentiable B\'ezier curves, while the other encodes motion dynamics via temporally-aware priors. Unlike prior work (e.g., DreamFusion), which optimizes from a single view per step, our multi-view optimization ensures structural alignment and avoids view ambiguity, critical for sparse sketches. Furthermore, we introduce a structure-aware motion module that separates shape-preserving trajectories from deformation-aware changes, enabling expressive motion such as flipping, rotation, and articulated movement. Extensive experiments show that our method produces temporally realistic and structurally stable 3D sketch animations, outperforming existing baselines in both fidelity and controllability. We hope this work serves as a step toward more intuitive and accessible 4D content creation.

Related papers

ActionMesh: Animated 3D Mesh Generation with Temporal 3D Diffusion [32.32525061239629]
Action-ready 3D model ready for action feedforward.<n>Autogenerative 3Dcoder generates 3D meshes with two inputs.<n>Extracts 3D shape with two inputs.
arXiv Detail & Related papers (2026-01-22T17:41:13Z)
Drag4D: Align Your Motion with Text-Driven 3D Scene Generation [77.79131321983677]
Drag4D is an interactive framework that integrates object motion control within text-driven 3D scene generation.<n>This framework enables users to define 3D trajectories for the 3D objects generated from a single image, seamlessly integrating them into a high-quality 3D background.
arXiv Detail & Related papers (2025-09-26T05:23:45Z)
Occlusion-robust Stylization for Drawing-based 3D Animation [20.793887576117527]
We propose Occlusion-robust Stylization Framework (OSF) for drawing-based 3D animation.<n> OSF operates in a single run instead of the previous two-stage method, achieving 2.4x faster inference and 2.1x less memory.
arXiv Detail & Related papers (2025-08-01T07:52:07Z)
Sketch2Anim: Towards Transferring Sketch Storyboards into 3D Animation [22.325990468075368]
Animators use the 2D sketches in storyboards as references to craft the desired 3D animations through a trial-and-error process.<n>There is a high demand for automated methods that can directly translate 2D storyboard sketches into 3D animations.<n>We present Sketch2Anim, composed of two key modules for sketch constraint understanding and motion generation.
arXiv Detail & Related papers (2025-04-27T10:38:17Z)
In-2-4D: Inbetweening from Two Single-View Images to 4D Generation [63.68181731564576]
We propose a new problem, Inbetween-2-4D, for generative 4D (i.e., 3D + motion) in interpolate two single-view images.<n>In contrast to video/4D generation from only text or a single image, our interpolative task can leverage more precise motion control to better constrain the generation.
arXiv Detail & Related papers (2025-04-11T09:01:09Z)
Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering [17.918603435615335]
3D sketches are widely used for visually representing the 3D shape and structure of objects or scenes.<n>We propose Diff3DS, a novel differentiable framework for generating view-consistent 3D sketch.<n>Our framework bridges the domains of 3D sketch and image, achieving end-toend optimization of 3D sketch.
arXiv Detail & Related papers (2024-05-24T07:48:14Z)
Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation [55.73399465968594]
This paper proposes a novel generation paradigm Sketch3D to generate realistic 3D assets with shape aligned with the input sketch and color matching the textual description. Three strategies are designed to optimize 3D Gaussians, i.e., structural optimization via a distribution transfer mechanism, color optimization with a straightforward MSE loss and sketch similarity optimization with a CLIP-based geometric similarity loss.
arXiv Detail & Related papers (2024-04-02T11:03:24Z)
3Doodle: Compact Abstraction of Objects with 3D Strokes [30.87733869892925]
We propose 3Dooole, generating descriptive and view-consistent sketch images. Our method is based on the idea that a set of 3D strokes can efficiently represent 3D structural information. The resulting sparse set of 3D strokes can be rendered as abstract sketches containing essential 3D characteristic shapes of various objects.
arXiv Detail & Related papers (2024-02-06T04:25:07Z)
Control3D: Towards Controllable Text-to-3D Generation [107.81136630589263]
We present a text-to-3D generation conditioning on the additional hand-drawn sketch, namely Control3D. A 2D conditioned diffusion model (ControlNet) is remoulded to guide the learning of 3D scene parameterized as NeRF. We exploit a pre-trained differentiable photo-to-sketch model to directly estimate the sketch of the rendered image over synthetic 3D scene.
arXiv Detail & Related papers (2023-11-09T15:50:32Z)
3D VR Sketch Guided 3D Shape Prototyping and Exploration [108.6809158245037]
We propose a 3D shape generation network that takes a 3D VR sketch as a condition. We assume that sketches are created by novices without art training. Our method creates multiple 3D shapes that align with the original sketch's structure.
arXiv Detail & Related papers (2023-06-19T10:27:24Z)
ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections [71.46546520120162]
Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging. We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild. We produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations.
arXiv Detail & Related papers (2023-06-07T17:47:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.