FuguReport

Helix4D: Complex 4D Mesh Generation

Authors Jiraphon Yenphraphai, Jianqi Chen, Jian Wang, Gordon Qian, Sergey Tulyakov, Rameen Abdal, Raymond A. Yeh, Peter Wonka, Chaoyang Wang
Affiliations Snap / King Abdullah University of Science and Technology / Purdue University
Categories Method / 3D Reconstruction / 4D mesh generation from video, Application / Geometry Processing / Modeling complex topology and structures, Method / Dynamic Mesh / Mesh generation framework
License CC BY 4.0

Abstract Overview

Helix4D is a video-to-4D dynamic mesh generation framework that adapts the pretrained Trellis2 image-to-3D model to produce temporally consistent mesh sequences from object-centric videos. The paper targets difficult cases that prior methods struggle with, including topology changes, transparent or semi-transparent materials, thin structures, and inner surfaces. Its design combines sliding-window cross-frame attention with a first-frame anchor, first-frame conditioning from a frozen Trellis2 reconstruction, and a parameter-free 4D positional encoding that repurposes low-frequency spatial RoPE bands for time. The authors evaluate the method on ActionBench, a held-out TexVerse subset, and a new 52-video Helix4DBench emphasizing complex dynamics and materials.

Novelty

The main novelty is a systematic way to lift a strong static 3D foundation model into video-conditioned 4D mesh generation while preserving pretrained geometric and material capabilities. Technically, the paper introduces anchor-based sliding-window cross-frame attention and a parameter-free spatiotemporal RoPE that reallocates redundant low-frequency spatial bands to temporal encoding instead of adding new temporal parameters.

Results

Helix4D improves CD-3D by 3.8% over ActionMesh on ActionBench, and on the harder 52-video Helix4DBench it outperforms all reported baselines on every metric, including ULIP-2 and Uni3D by 5.7% and 7.8% over the strongest baseline. In user studies, it is preferred to the best-performing baseline in 67.9% of comparisons, and on a held-out TexVerse test set it achieves the best CD-3D and CD-4D among compared methods. Ablations further show that first-frame conditioning, the proposed 4D rotary embedding, and sliding-window-plus-anchor attention each contribute to quality and temporal consistency.

Key Points

  1. Helix4D extends Trellis2 from single-image 3D generation to video-conditioned 4D mesh generation while retaining support for non-watertight geometry, complex materials, and inner surfaces.
  2. The method uses sliding-window cross-frame attention with a first-frame anchor and first-frame conditioning so later frames can inherit strong static reconstruction priors efficiently.
  3. Across Helix4DBench, ActionBench, and a held-out TexVerse subset, the model reports the strongest overall quantitative results among compared baselines, especially on challenging topology and material changes.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.