Related papers: DyNCA: Real-time Dynamic Texture Synthesis Using Neural Cellular Automata

DyNCA: Real-time Dynamic Texture Synthesis Using Neural Cellular Automata

URL: http://arxiv.org/abs/2211.11417v2
Date: Thu, 30 Mar 2023 21:56:33 GMT
Title: DyNCA: Real-time Dynamic Texture Synthesis Using Neural Cellular Automata
Authors: Ehsan Pajouheshgar, Yitao Xu, Tong Zhang, Sabine S\"usstrunk
Abstract summary: We propose Dynamic Neural Cellular Automata (DyNCA), a framework for real-time and controllable dynamic texture synthesis. Our method is built upon the recently introduced NCA models and can synthesize infinitely long and arbitrary-sized realistic video textures in real time. Our model offers several real-time video controls including motion speed, motion direction, and an editing brush tool.
Score: 12.05119084381406
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Current Dynamic Texture Synthesis (DyTS) models can synthesize realistic videos. However, they require a slow iterative optimization process to synthesize a single fixed-size short video, and they do not offer any post-training control over the synthesis process. We propose Dynamic Neural Cellular Automata (DyNCA), a framework for real-time and controllable dynamic texture synthesis. Our method is built upon the recently introduced NCA models and can synthesize infinitely long and arbitrary-sized realistic video textures in real time. We quantitatively and qualitatively evaluate our model and show that our synthesized videos appear more realistic than the existing results. We improve the SOTA DyTS performance by $2\sim 4$ orders of magnitude. Moreover, our model offers several real-time video controls including motion speed, motion direction, and an editing brush tool. We exhibit our trained models in an online interactive demo that runs on local hardware and is accessible on personal computers and smartphones.

Related papers

Coarse-to-Real: Generative Rendering for Populated Dynamic Scenes [22.450051108066216]
We present C2R (Coarse-to-Real), a generative framework that synthesizes real-style urban crowd videos.<n>Our approach uses coarse 3D renderings to explicitly control scene layout, camera motion, and human trajectories.<n>It produces temporally consistent, controllable, and realistic urban scene videos from minimal 3D input.
arXiv Detail & Related papers (2026-01-29T20:29:04Z)
MAD: Motion Appearance Decoupling for efficient Driving World Models [94.40548866741791]
We propose an efficient adaptation framework that converts generalist video models into controllable driving world models.<n>Key idea is to decouple motion learning from appearance synthesis.<n>Scaling to LTX, our MAD-LTX model outperforms all open-source competitors.
arXiv Detail & Related papers (2026-01-14T12:52:23Z)
SpriteHand: Real-Time Versatile Hand-Object Interaction with Autoregressive Video Generation [64.3409486422946]
We present SpriteHand, an autoregressive video generation framework for real-time synthesis of hand-object interaction videos.<n>Our model employs a causal inference architecture for autoregressive generation and leverages a hybrid post-training approach to enhance visual realism and temporal coherence.<n> Experiments demonstrate superior visual quality, physical plausibility, and interaction fidelity compared to both generative and engine-based baselines.
arXiv Detail & Related papers (2025-12-01T18:13:40Z)
Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising [83.09163795450407]
We propose an approach to enhancing synthetic video realism, which can re-render synthetic videos from a simulator in photorealistic fashion.<n>Our framework focuses on preserving the multi-level structures from synthetic videos into the enhanced one in both spatial and temporal domains.
arXiv Detail & Related papers (2025-11-18T18:06:29Z)
Pre-Trained Video Generative Models as World Simulators [59.546627730477454]
We propose Dynamic World Simulation (DWS) to transform pre-trained video generative models into controllable world simulators. To achieve precise alignment between conditioned actions and generated visual changes, we introduce a lightweight, universal action-conditioned module. Experiments demonstrate that DWS can be versatilely applied to both diffusion and autoregressive transformer models.
arXiv Detail & Related papers (2025-02-10T14:49:09Z)
StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models [59.55232046525733]
We introduce StreetCrafter, a controllable video diffusion model that utilizes LiDAR point cloud renderings as pixel-level conditions. In addition, the utilization of pixel-level LiDAR conditions allows us to make accurate pixel-level edits to target scenes. Our model enables flexible control over viewpoint changes, enlarging the view for satisfying rendering regions.
arXiv Detail & Related papers (2024-12-17T18:58:55Z)
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis [43.02778060969546]
We propose a controllable monocular dynamic view synthesis pipeline. Our model does not require depth as input, and does not explicitly model 3D scene geometry. We believe our framework can potentially unlock powerful applications in rich dynamic scene understanding, perception for robotics, and interactive 3D video viewing experiences for virtual reality.
arXiv Detail & Related papers (2024-05-23T17:59:52Z)
TC4D: Trajectory-Conditioned Text-to-4D Generation [94.90700997568158]
We propose TC4D: trajectory-conditioned text-to-4D generation, which factors motion into global and local components. We learn local deformations that conform to the global trajectory using supervision from a text-to-video model. Our approach enables the synthesis of scenes animated along arbitrary trajectories, compositional scene generation, and significant improvements to the realism and amount of generated motion.
arXiv Detail & Related papers (2024-03-26T17:55:11Z)
Lumiere: A Space-Time Diffusion Model for Video Generation [75.54967294846686]
We introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once. This is in contrast to existing video models which synthesize distants followed by temporal super-resolution. By deploying both spatial and (importantly) temporal down- and up-sampling, our model learns to directly generate a full-frame-rate, low-resolution video.
arXiv Detail & Related papers (2024-01-23T18:05:25Z)
Interactive Character Control with Auto-Regressive Motion Diffusion Models [18.727066177880708]
We propose A-MDM (Auto-regressive Motion Diffusion Model) for real-time motion synthesis. Our conditional diffusion model takes an initial pose as input, and auto-regressively generates successive motion frames conditioned on previous frame. We introduce a suite of techniques for incorporating interactive controls into A-MDM, such as task-oriented sampling, in-painting, and hierarchical reinforcement learning.
arXiv Detail & Related papers (2023-06-01T07:48:34Z)
DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene. We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views. We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z)
RoboCraft: Learning to See, Simulate, and Shape Elasto-Plastic Objects with Graph Networks [32.00371492516123]
We present a model-based planning framework for modeling and manipulating elasto-plastic objects. Our system, RoboCraft, learns a particle-based dynamics model using graph neural networks (GNNs) to capture the structure of the underlying system. We show through experiments that with just 10 minutes of real-world robotic interaction data, our robot can learn a dynamics model that can be used to synthesize control signals to deform elasto-plastic objects into various target shapes.
arXiv Detail & Related papers (2022-05-05T20:28:15Z)
Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes. We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature. We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z)
Inferring Articulated Rigid Body Dynamics from RGBD Video [18.154013621342266]
We introduce a pipeline that combines inverse rendering with differentiable simulation to create digital twins of real-world articulated mechanisms. Our approach accurately reconstructs the kinematic tree of an articulated mechanism being manipulated by a robot.
arXiv Detail & Related papers (2022-03-20T08:19:02Z)
Render In-between: Motion Guided Video Synthesis for Action Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance. A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset. Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z)
Real-time Deep Dynamic Characters [95.5592405831368]
We propose a deep videorealistic 3D human character model displaying highly realistic shape, motion, and dynamic appearance. We use a novel graph convolutional network architecture to enable motion-dependent deformation learning of body and clothing. We show that our model creates motion-dependent surface deformations, physically plausible dynamic clothing deformations, as well as video-realistic surface textures at a much higher level of detail than previous state of the art approaches.
arXiv Detail & Related papers (2021-05-04T23:28:55Z)
Dynamic Texture Synthesis by Incorporating Long-range Spatial and Temporal Correlations [27.247382497265214]
We introduce a new loss term, called the Shifted Gram loss, to capture the structural and long-range correlation of the reference texture video. We also introduce a frame sampling strategy to exploit long-period motion across multiple frames.
arXiv Detail & Related papers (2021-04-13T05:04:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.