WorldTree: Towards 4D Dynamic Worlds from Monocular Video using Tree-Chains
- URL: http://arxiv.org/abs/2602.11845v1
- Date: Thu, 12 Feb 2026 11:38:35 GMT
- Title: WorldTree: Towards 4D Dynamic Worlds from Monocular Video using Tree-Chains
- Authors: Qisen Wang, Yifan Zhao, Jia Li,
- Abstract summary: WorldTree is a unified framework that enables coarse-to-fine optimization based on inheritance-based partition tree structure for hierarchical temporal decomposition.<n>Our proposed method achieves 8.26% improvement of LPIPS on NVIDIA-LS and 9.09% improvement of reconstruction on DyCheck compared to the second-best method.
- Score: 13.122536259577453
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic reconstruction has achieved remarkable progress, but there remain challenges in monocular input for more practical applications. The prevailing works attempt to construct efficient motion representations, but lack a unified spatiotemporal decomposition framework, suffering from either holistic temporal optimization or coupled hierarchical spatial composition. To this end, we propose WorldTree, a unified framework comprising Temporal Partition Tree (TPT) that enables coarse-to-fine optimization based on the inheritance-based partition tree structure for hierarchical temporal decomposition, and Spatial Ancestral Chains (SAC) that recursively query ancestral hierarchical structure to provide complementary spatial dynamics while specializing motion representations across ancestral nodes. Experimental results on different datasets indicate that our proposed method achieves 8.26% improvement of LPIPS on NVIDIA-LS and 9.09% improvement of mLPIPS on DyCheck compared to the second-best method. Code: https://github.com/iCVTEAM/WorldTree.
Related papers
- Dynamic-TreeRPO: Breaking the Independent Trajectory Bottleneck with Structured Sampling [14.415169190908676]
We propose Dynamic-TreeRPO, which implements the sliding-window sampling strategy as a tree-structured noise intensities along depth.<n>With well-designed noise intensities for each tree layer, Dynamic-TreeRPO can enhance the variation of exploration without any extra computational cost.
arXiv Detail & Related papers (2025-09-27T14:59:31Z) - TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling [65.46347858249295]
TreePO is a self-guided rollout algorithm that views sequence generation as a tree-structured searching process.<n>TreePO essentially reduces the per-update compute burden while preserving or enhancing exploration diversity.
arXiv Detail & Related papers (2025-08-24T16:52:37Z) - Hierarchical Quantized Diffusion Based Tree Generation Method for Hierarchical Representation and Lineage Analysis [49.00783841494125]
HDTree captures tree relationships within a hierarchical latent space using a unified hierarchical codebook and quantized diffusion processes.<n> HDTree's effectiveness is demonstrated through comparisons on both general-purpose and single-cell datasets.<n>These contributions provide a new tool for hierarchical lineage analysis, enabling more accurate and efficient modeling of cellular differentiation paths.
arXiv Detail & Related papers (2025-06-29T15:19:13Z) - Birch SGD: A Tree Graph Framework for Local and Asynchronous SGD Methods [51.54704494242525]
We propose a new unifying framework, Birch SGD, for analyzing and designing distributed SGD methods.<n>Using Birch SGD, we design eight new methods and analyze them alongside previously known ones, with at least six of the new methods shown to have optimal computational time complexity.<n>Our research leads to two key insights: (i) all methods share the same "iteration rate" of $Oleft(frac(R + 1) L Deltavarepsilon + fracsigma2 L Deltavarepsilon2right)$, where $R$
arXiv Detail & Related papers (2025-05-14T08:37:45Z) - Tree-NeRV: A Tree-Structured Neural Representation for Efficient Non-Uniform Video Encoding [26.638854682076733]
Implicit Neural Representations for Videos (NeRV) have emerged as a powerful paradigm for video representation.<n>Existing NeRV-based methods rely on uniform sampling along the temporal axis, leading to suboptimal rate-distortion (RD) performance.<n>We propose Tree-NeRV, a novel tree-structured feature representation for efficient and adaptive video encoding.
arXiv Detail & Related papers (2025-04-17T12:40:33Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Principal Geodesic Analysis of Merge Trees (and Persistence Diagrams) [8.430851504111585]
We introduce an efficient, iterative algorithm which exploits shared-memory parallelism, as well as an analytic expression of the fitting energy gradient.
We show the utility of our contributions by extending to merge trees two typical PCA applications.
We present a dimensionality reduction framework exploiting the first two directions of the MT-PGA basis to generate two-dimensional layouts.
arXiv Detail & Related papers (2022-07-22T09:17:22Z) - L4KDE: Learning for KinoDynamic Tree Expansion [28.63535068379981]
We present the Learning for KinoDynamic Tree Expansion (L4KDE) method for kinodynamic planning.
L4KDE uses a neural network to predict transition costs between queried states, which can be efficiently computed in batch.
We empirically demonstrate the significant performance improvement provided by L4KDE on a variety of challenging system dynamics.
arXiv Detail & Related papers (2022-03-02T09:33:45Z) - Rethinking Learnable Tree Filter for Generic Feature Transform [71.77463476808585]
Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation.
To relax the geometric constraint, we give the analysis by reformulating it as a Markov Random Field and introduce a learnable unary term.
For semantic segmentation, we achieve leading performance (82.1% mIoU) on the Cityscapes benchmark without bells-and-whistles.
arXiv Detail & Related papers (2020-12-07T07:16:47Z) - On the spatial attention in Spatio-Temporal Graph Convolutional Networks
for skeleton-based human action recognition [97.14064057840089]
Graphal networks (GCNs) promising performance in skeleton-based human action recognition by modeling a sequence of skeletons as a graph.
Most of the recently proposed G-temporal-based methods improve the performance by learning the graph structure at each layer of the network.
arXiv Detail & Related papers (2020-11-07T19:03:04Z) - From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical
Clustering [33.000371053304676]
We present the first continuous relaxation of Dasgupta's discrete optimization problem with provable quality guarantees.
We show that even approximate solutions found with gradient descent have superior quality than agglomerative clusterings.
We also highlight the flexibility of HypHC using end-to-end training in a downstream classification task.
arXiv Detail & Related papers (2020-10-01T13:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.