FourierHandFlow: Neural 4D Hand Representation Using Fourier Query Flow
- URL: http://arxiv.org/abs/2307.08100v1
- Date: Sun, 16 Jul 2023 16:58:37 GMT
- Title: FourierHandFlow: Neural 4D Hand Representation Using Fourier Query Flow
- Authors: Jihyun Lee, Junbong Jang, Donghwan Kim, Minhyuk Sung, Tae-Kyun Kim
- Abstract summary: Recent 4D shape representations do not capture implicit correspondences between articulated shapes or regularize jittery temporal deformations.
To effectively model-temporal deformations of articulated hands, we compose our 4D representation based on two types of query flow.
Our method achieves state-the-art results on video-based 4D reconstruction while being more efficient than the existing 3D/4D implicit shape representations.
- Score: 55.61843393812704
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent 4D shape representations model continuous temporal evolution of
implicit shapes by (1) learning query flows without leveraging shape and
articulation priors or (2) decoding shape occupancies separately for each time
value. Thus, they do not effectively capture implicit correspondences between
articulated shapes or regularize jittery temporal deformations. In this work,
we present FourierHandFlow, which is a spatio-temporally continuous
representation for human hands that combines a 3D occupancy field with
articulation-aware query flows represented as Fourier series. Given an input
RGB sequence, we aim to learn a fixed number of Fourier coefficients for each
query flow to guarantee smooth and continuous temporal shape dynamics. To
effectively model spatio-temporal deformations of articulated hands, we compose
our 4D representation based on two types of Fourier query flow: (1) pose flow
that models query dynamics influenced by hand articulation changes via implicit
linear blend skinning and (2) shape flow that models query-wise displacement
flow. In the experiments, our method achieves state-of-the-art results on
video-based 4D reconstruction while being computationally more efficient than
the existing 3D/4D implicit shape representations. We additionally show our
results on motion inter- and extrapolation and texture transfer using the
learned correspondences of implicit shapes. To the best of our knowledge,
FourierHandFlow is the first neural 4D continuous hand representation learned
from RGB videos. The code will be publicly accessible.
Related papers
- FB-Diff: Fourier Basis-guided Diffusion for Temporal Interpolation of 4D Medical Imaging [38.70420710947938]
The temporal task for 4D medical imaging plays a crucial role in clinical practice of respiratory motion modeling.<n>We propose a Fourier basis-guided Diffusion model, termed FB-Diff.<n>We show that FB-Diff achieves state-of-the-art metrics with better temporal consistency while maintaining promising reconstruction metrics.
arXiv Detail & Related papers (2025-07-06T21:39:48Z) - UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion [51.31220416754788]
We present UDiFF, a 3D diffusion model for unsigned distance fields (UDFs) which is capable to generate textured 3D shapes with open surfaces from text conditions or unconditionally.
Our key idea is to generate UDFs in spatial-frequency domain with an optimal wavelet transformation, which produces a compact representation space for UDF generation.
arXiv Detail & Related papers (2024-04-10T09:24:54Z) - Equivariant Graph Neural Operator for Modeling 3D Dynamics [148.98826858078556]
We propose Equivariant Graph Neural Operator (EGNO) to directly models dynamics as trajectories instead of just next-step prediction.
EGNO explicitly learns the temporal evolution of 3D dynamics where we formulate the dynamics as a function over time and learn neural operators to approximate it.
Comprehensive experiments in multiple domains, including particle simulations, human motion capture, and molecular dynamics, demonstrate the significantly superior performance of EGNO against existing methods.
arXiv Detail & Related papers (2024-01-19T21:50:32Z) - Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking [52.393359791978035]
Motion2VecSets is a 4D diffusion model for dynamic surface reconstruction from point cloud sequences.
We parameterize 4D dynamics with latent sets instead of using global latent codes.
For more temporally-coherent object tracking, we synchronously denoise deformation latent sets and exchange information across multiple frames.
arXiv Detail & Related papers (2024-01-12T15:05:08Z) - Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle [9.082693946898733]
We introduce a novel point-based approach for fast dynamic scene reconstruction and real-time rendering from both multi-view and monocular videos.
In contrast to the prevalent NeRF-based approaches hampered by slow training and rendering speeds, our approach harnesses recent advancements in point-based 3D Gaussian Splatting (3DGS)
Our proposed approach showcases a substantial efficiency improvement, achieving a $5times$ faster training speed compared to the per-frame 3DGS modeling.
arXiv Detail & Related papers (2023-12-06T11:25:52Z) - Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features [27.44390031735071]
Diff3F is a class-agnostic feature descriptor for untextured input shapes.
We distill diffusion features from image foundational models onto input shapes.
In the process, we produce (diffusion) features in 2D that we subsequently lift and aggregate on the original surface.
arXiv Detail & Related papers (2023-11-28T18:27:15Z) - DiffusionSDF: Conditional Generative Modeling of Signed Distance
Functions [42.015077094731815]
DiffusionSDF is a generative model for shape completion, single-view reconstruction, and reconstruction of real-scanned point clouds.
We use neural signed distance functions (SDFs) as our 3D representation to parameterize the geometry of various signals (e.g., point clouds, 2D images) through neural networks.
arXiv Detail & Related papers (2022-11-24T18:59:01Z) - Learning Parallel Dense Correspondence from Spatio-Temporal Descriptors
for Efficient and Robust 4D Reconstruction [43.60322886598972]
This paper focuses on the task of 4D shape reconstruction from a sequence of point clouds.
We present a novel pipeline to learn a temporal evolution of the 3D human shape through capturing continuous transformation functions among cross-frame occupancy fields.
arXiv Detail & Related papers (2021-03-30T13:36:03Z) - Adjoint Rigid Transform Network: Task-conditioned Alignment of 3D Shapes [86.2129580231191]
Adjoint Rigid Transform (ART) Network is a neural module which can be integrated with a variety of 3D networks.
ART learns to rotate input shapes to a learned canonical orientation, which is crucial for a lot of tasks.
We will release our code and pre-trained models for further research.
arXiv Detail & Related papers (2021-02-01T20:58:45Z) - SeqXY2SeqZ: Structure Learning for 3D Shapes by Sequentially Predicting
1D Occupancy Segments From 2D Coordinates [61.04823927283092]
We propose to represent 3D shapes using 2D functions, where the output of the function at each 2D location is a sequence of line segments inside the shape.
We implement this approach using a Seq2Seq model with attention, called SeqXY2SeqZ, which learns the mapping from a sequence of 2D coordinates along two arbitrary axes to a sequence of 1D locations along the third axis.
Our experiments show that SeqXY2SeqZ outperforms the state-ofthe-art methods under widely used benchmarks.
arXiv Detail & Related papers (2020-03-12T00:24:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.