Related papers: TC4D: Trajectory-Conditioned Text-to-4D Generation

TC4D: Trajectory-Conditioned Text-to-4D Generation

URL: http://arxiv.org/abs/2403.17920v3
Date: Mon, 14 Oct 2024 18:07:24 GMT
Title: TC4D: Trajectory-Conditioned Text-to-4D Generation
Authors: Sherwin Bahmani, Xian Liu, Wang Yifan, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, Andrea Tagliasacchi, David B. Lindell,
Abstract summary: We propose TC4D: trajectory-conditioned text-to-4D generation, which factors motion into global and local components. We learn local deformations that conform to the global trajectory using supervision from a text-to-video model. Our approach enables the synthesis of scenes animated along arbitrary trajectories, compositional scene generation, and significant improvements to the realism and amount of generated motion.
Score: 94.90700997568158
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using supervision from pre-trained text-to-video models. However, existing representations for motion, such as deformation models or time-dependent neural representations, are limited in the amount of motion they can generate-they cannot synthesize motion extending far beyond the bounding box used for volume rendering. The lack of a more flexible motion model contributes to the gap in realism between 4D generation methods and recent, near-photorealistic video generation models. Here, we propose TC4D: trajectory-conditioned text-to-4D generation, which factors motion into global and local components. We represent the global motion of a scene's bounding box using rigid transformation along a trajectory parameterized by a spline. We learn local deformations that conform to the global trajectory using supervision from a text-to-video model. Our approach enables the synthesis of scenes animated along arbitrary trajectories, compositional scene generation, and significant improvements to the realism and amount of generated motion, which we evaluate qualitatively and through a user study. Video results can be viewed on our website: https://sherwinbahmani.github.io/tc4d.

Related papers

PaintScene4D: Consistent 4D Scene Generation from Text Prompts [29.075849524496707]
PaintScene4D is a novel text-to-4D scene generation framework. It harnesses video generative models trained on diverse real-world datasets. It produces realistic 4D scenes that can be viewed from arbitrary trajectories.
arXiv Detail & Related papers (2024-12-05T18:59:57Z)
Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis [60.853577108780414]
Existing 4D generation methods can generate high-quality 4D objects or scenes based on user-friendly conditions. We propose Trans4D, a novel text-to-4D synthesis framework that enables realistic complex scene transitions. In experiments, Trans4D consistently outperforms existing state-of-the-art methods in generating 4D scenes with accurate and high-quality transitions.
arXiv Detail & Related papers (2024-10-09T17:56:03Z)
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models [53.89348957053395]
We introduce a novel pipeline designed for text-to-4D scene generation. Our method begins by generating a reference video using the video generation model. We then learn the canonical 3D representation of the video using a freeze-time video.
arXiv Detail & Related papers (2024-06-11T17:19:26Z)
SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer [57.506654943449796]
We propose an efficient, sparse-controlled video-to-4D framework named SC4D that decouples motion and appearance. Our method surpasses existing methods in both quality and efficiency. We devise a novel application that seamlessly transfers motion onto a diverse array of 4D entities.
arXiv Detail & Related papers (2024-04-04T18:05:18Z)
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models [94.07744207257653]
We focus on the underexplored text-to-4D setting and synthesize dynamic, animated 3D objects. We combine text-to-image, text-to-video, and 3D-aware multiview diffusion models to provide feedback during 4D object optimization.
arXiv Detail & Related papers (2023-12-21T11:41:02Z)
4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling [91.99172731031206]
Current text-to-4D methods face a three-way tradeoff between quality of scene appearance, 3D structure, and motion. We introduce hybrid score distillation sampling, an alternating optimization procedure that blends supervision signals from multiple pre-trained diffusion models.
arXiv Detail & Related papers (2023-11-29T18:58:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.