3DProxyImg: Controllable 3D-Aware Animation Synthesis from Single Image via 2D-3D Aligned Proxy Embedding
- URL: http://arxiv.org/abs/2512.15126v2
- Date: Fri, 19 Dec 2025 06:41:20 GMT
- Title: 3DProxyImg: Controllable 3D-Aware Animation Synthesis from Single Image via 2D-3D Aligned Proxy Embedding
- Authors: Yupeng Zhu, Xiongzhen Zhang, Ye Chen, Bingbing Ni,
- Abstract summary: We propose a lightweight 3D animation framework that decouples geometric control from appearance synthesis.<n>Our method achieves efficient animation generation on low-power platforms.
- Score: 46.75707405618843
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D animation is central to modern visual media, yet traditional production pipelines remain labor-intensive, expertise-demanding, and computationally expensive. Recent AIGC-based approaches partially automate asset creation and rigging, but they either inherit the heavy costs of full 3D pipelines or rely on video-synthesis paradigms that sacrifice 3D controllability and interactivity. We focus on single-image 3D animation generation and argue that progress is fundamentally constrained by a trade-off between rendering quality and 3D control. To address this limitation, we propose a lightweight 3D animation framework that decouples geometric control from appearance synthesis. The core idea is a 2D-3D aligned proxy representation that uses a coarse 3D estimate as a structural carrier, while delegating high-fidelity appearance and view synthesis to learned image-space generative priors. This proxy formulation enables 3D-aware motion control and interaction comparable to classical pipelines, without requiring accurate geometry or expensive optimization, and naturally extends to coherent background animation. Extensive experiments demonstrate that our method achieves efficient animation generation on low-power platforms and outperforms video-based 3D animation generation in identity preservation, geometric and textural consistency, and the level of precise, interactive control it offers to users.
Related papers
- Instant Expressive Gaussian Head Avatar via 3D-Aware Expression Distillation [46.27695095774081]
2D diffusion-based methods often compromise 3D consistency and speed.<n>3D-aware facial animation feedforward methods ensure 3D consistency and achieve faster inference speed.<n>Our method runs at 107.31 FPS for animation and pose control while achieving comparable animation quality to the state-of-the-art.
arXiv Detail & Related papers (2025-12-18T18:53:28Z) - Drag4D: Align Your Motion with Text-Driven 3D Scene Generation [77.79131321983677]
Drag4D is an interactive framework that integrates object motion control within text-driven 3D scene generation.<n>This framework enables users to define 3D trajectories for the 3D objects generated from a single image, seamlessly integrating them into a high-quality 3D background.
arXiv Detail & Related papers (2025-09-26T05:23:45Z) - Puppeteer: Rig and Animate Your 3D Models [105.11046762553121]
Puppeteer is a comprehensive framework that addresses both automatic rigging and animation for diverse 3D objects.<n>Our system first predicts plausible skeletal structures via an auto-regressive transformer.<n>It then infers skinning weights via an attention-based architecture.
arXiv Detail & Related papers (2025-08-14T17:59:31Z) - Constructing a 3D Scene from a Single Image [31.11317559252235]
SceneFuse-3D is a training-free framework designed to synthesize coherent 3D scenes from a single top-down view.<n>We decompose the input image into overlapping regions and generate each using a pretrained 3D object generator.<n>This modular design allows us to overcome resolution bottlenecks and preserve spatial structure without requiring 3D supervision or fine-tuning.
arXiv Detail & Related papers (2025-05-21T17:10:47Z) - I2V3D: Controllable image-to-video generation with 3D guidance [42.23117201457898]
IV23D is a framework for animating static images into dynamic videos with precise 3D control.<n>Our approach combines the precision of a computer graphics pipeline with advanced generative models.
arXiv Detail & Related papers (2025-03-12T18:26:34Z) - Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes [49.26872036160368]
We propose a method for animating parts of high-quality 3D scenes in a Gaussian Splatting representation.<n>We find that, in contrast to prior work, this enables realistic animations of complex, pre-existing 3D scenes.
arXiv Detail & Related papers (2024-11-28T16:01:58Z) - MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling [21.1274747033854]
Character video synthesis aims to produce realistic videos of animatable characters within lifelike scenes.<n>Milo is a novel framework which can synthesize character videos with controllable attributes.<n>Milo achieves advanced scalability to arbitrary characters, generality to novel 3D motions, and applicability to interactive real-world scenes.
arXiv Detail & Related papers (2024-09-24T15:00:07Z) - iControl3D: An Interactive System for Controllable 3D Scene Generation [57.048647153684485]
iControl3D is a novel interactive system that empowers users to generate and render customizable 3D scenes with precise control.
We leverage 3D meshes as an intermediary proxy to iteratively merge individual 2D diffusion-generated images into a cohesive and unified 3D scene representation.
Our neural rendering interface enables users to build a radiance field of their scene online and navigate the entire scene.
arXiv Detail & Related papers (2024-08-03T06:35:09Z) - CC3D: Layout-Conditioned Generation of Compositional 3D Scenes [49.281006972028194]
We introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts.
Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality.
arXiv Detail & Related papers (2023-03-21T17:59:02Z) - Unsupervised Volumetric Animation [54.52012366520807]
We propose a novel approach for unsupervised 3D animation of non-rigid deformable objects.
Our method learns the 3D structure and dynamics of objects solely from single-view RGB videos.
We show our model can obtain animatable 3D objects from a single volume or few images.
arXiv Detail & Related papers (2023-01-26T18:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.