Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective
- URL: http://arxiv.org/abs/2204.04730v2
- Date: Tue, 13 Aug 2024 07:30:54 GMT
- Title: Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective
- Authors: Hui Deng, Tong Zhang, Yuchao Dai, Jiawei Shi, Yiran Zhong, Hongdong Li,
- Abstract summary: We propose to model deep NRSfM from a sequence-to-sequence translation perspective.
First, we apply a shape-motion predictor to estimate the initial non-rigid shape and camera motion from a single frame.
Then we propose a context modeling module to model camera motions and complex non-rigid shapes.
- Score: 81.56957468529602
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Directly regressing the non-rigid shape and camera pose from the individual 2D frame is ill-suited to the Non-Rigid Structure-from-Motion (NRSfM) problem. This frame-by-frame 3D reconstruction pipeline overlooks the inherent spatial-temporal nature of NRSfM, i.e., reconstructing the whole 3D sequence from the input 2D sequence. In this paper, we propose to model deep NRSfM from a sequence-to-sequence translation perspective, where the input 2D frame sequence is taken as a whole to reconstruct the deforming 3D non-rigid shape sequence. First, we apply a shape-motion predictor to estimate the initial non-rigid shape and camera motion from a single frame. Then we propose a context modeling module to model camera motions and complex non-rigid shapes. To tackle the difficulty in enforcing the global structure constraint within the deep framework, we propose to impose the union-of-subspace structure by replacing the self-expressiveness layer with multi-head attention and delayed regularizers, which enables end-to-end batch-wise training. Experimental results across different datasets such as Human3.6M, CMU Mocap and InterHand prove the superiority of our framework.
Related papers
- Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion [43.07285784556328]
Existing single image-to-3D creation methods typically involve a two-stage process.
We introduce a unified 3D generation framework, named Ouroboros3D, which integrates multi-view image generation and 3D reconstruction.
arXiv Detail & Related papers (2024-06-05T12:15:22Z) - Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion
Modeling [83.76377808476039]
We propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior.
Inspired by the field of non-rigid structure-from-motion, we divide the task of reconstructing 3D human skeletons in motion into the estimation of a 3D reference skeleton.
A mixed spatial-temporal NRSfMformer is used to simultaneously estimate the 3D reference skeleton and the skeleton deformation of each frame from 2D observations sequence.
arXiv Detail & Related papers (2023-08-18T16:41:57Z) - Controllable GAN Synthesis Using Non-Rigid Structure-from-Motion [1.7205106391379026]
We present an approach for combining non-rigid structure-from-motion (NRSfM) with deep generative models.
We propose an efficient framework for discovering trajectories in the latent space of 2D GANs corresponding to changes in 3D geometry.
arXiv Detail & Related papers (2022-11-14T08:37:55Z) - 3D Shape Reconstruction from 2D Images with Disentangled Attribute Flow [61.62796058294777]
Reconstructing 3D shape from a single 2D image is a challenging task.
Most of the previous methods still struggle to extract semantic attributes for 3D reconstruction task.
We propose 3DAttriFlow to disentangle and extract semantic attributes through different semantic levels in the input images.
arXiv Detail & Related papers (2022-03-29T02:03:31Z) - 3D Skeleton-based Few-shot Action Recognition with JEANIE is not so
Na\"ive [28.720272938306692]
We propose a Few-shot Learning pipeline for 3D skeleton-based action recognition by Joint tEmporal and cAmera viewpoiNt alIgnmEnt.
arXiv Detail & Related papers (2021-12-23T16:09:23Z) - CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects
from Point Clouds [97.63549045541296]
We propose a unified framework that can handle 9DoF pose tracking for novel rigid object instances and per-part pose tracking for articulated objects.
Our method achieves new state-of-the-art performance on category-level rigid object pose (NOCS-REAL275) and articulated object pose benchmarks (SAPIEN, BMVC) at the fastest FPS 12.
arXiv Detail & Related papers (2021-04-08T00:14:58Z) - Dense Non-Rigid Structure from Motion: A Manifold Viewpoint [162.88686222340962]
Non-Rigid Structure-from-Motion (NRSfM) problem aims to recover 3D geometry of a deforming object from its 2D feature correspondences across multiple frames.
We show that our approach significantly improves accuracy, scalability, and robustness against noise.
arXiv Detail & Related papers (2020-06-15T09:15:54Z) - SparseFusion: Dynamic Human Avatar Modeling from Sparse RGBD Images [49.52782544649703]
We propose a novel approach to reconstruct 3D human body shapes based on a sparse set of RGBD frames.
The main challenge is how to robustly fuse these sparse frames into a canonical 3D model.
Our framework is flexible, with potential applications going beyond shape reconstruction.
arXiv Detail & Related papers (2020-06-05T18:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.