Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation
Perspective
- URL: http://arxiv.org/abs/2204.04730v1
- Date: Sun, 10 Apr 2022 17:13:52 GMT
- Title: Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation
Perspective
- Authors: Hui Deng and Tong Zhang and Yuchao Dai and Jiawei Shi and Yiran Zhong
and Hongdong Li
- Abstract summary: We propose to model deep NRSfM from a sequence-to-sequence translation perspective.
First, we apply a shape-motion predictor to estimate the initial non-rigid shape and camera motion from a single frame.
Then we propose a context modeling module to model camera motions and complex non-rigid shapes.
- Score: 95.26840571484443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Directly regressing the non-rigid shape and camera pose from the individual
2D frame is ill-suited to the Non-Rigid Structure-from-Motion (NRSfM) problem.
This frame-by-frame 3D reconstruction pipeline overlooks the inherent
spatial-temporal nature of NRSfM, i.e., reconstructing the whole 3D sequence
from the input 2D sequence. In this paper, we propose to model deep NRSfM from
a sequence-to-sequence translation perspective, where the input 2D frame
sequence is taken as a whole to reconstruct the deforming 3D non-rigid shape
sequence. First, we apply a shape-motion predictor to estimate the initial
non-rigid shape and camera motion from a single frame. Then we propose a
context modeling module to model camera motions and complex non-rigid shapes.
To tackle the difficulty in enforcing the global structure constraint within
the deep framework, we propose to impose the union-of-subspace structure by
replacing the self-expressiveness layer with multi-head attention and delayed
regularizers, which enables end-to-end batch-wise training. Experimental
results across different datasets such as Human3.6M, CMU Mocap and InterHand
prove the superiority of our framework. The code will be made publicly
available
Related papers
- Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion [43.07285784556328]
Existing single image-to-3D creation methods typically involve a two-stage process.
We introduce a unified 3D generation framework, named Ouroboros3D, which integrates multi-view image generation and 3D reconstruction.
arXiv Detail & Related papers (2024-06-05T12:15:22Z) - Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion
Modeling [83.76377808476039]
We propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior.
Inspired by the field of non-rigid structure-from-motion, we divide the task of reconstructing 3D human skeletons in motion into the estimation of a 3D reference skeleton.
A mixed spatial-temporal NRSfMformer is used to simultaneously estimate the 3D reference skeleton and the skeleton deformation of each frame from 2D observations sequence.
arXiv Detail & Related papers (2023-08-18T16:41:57Z) - Controllable GAN Synthesis Using Non-Rigid Structure-from-Motion [1.7205106391379026]
We present an approach for combining non-rigid structure-from-motion (NRSfM) with deep generative models.
We propose an efficient framework for discovering trajectories in the latent space of 2D GANs corresponding to changes in 3D geometry.
arXiv Detail & Related papers (2022-11-14T08:37:55Z) - 3D Shape Reconstruction from 2D Images with Disentangled Attribute Flow [61.62796058294777]
Reconstructing 3D shape from a single 2D image is a challenging task.
Most of the previous methods still struggle to extract semantic attributes for 3D reconstruction task.
We propose 3DAttriFlow to disentangle and extract semantic attributes through different semantic levels in the input images.
arXiv Detail & Related papers (2022-03-29T02:03:31Z) - 3D Skeleton-based Few-shot Action Recognition with JEANIE is not so
Na\"ive [28.720272938306692]
We propose a Few-shot Learning pipeline for 3D skeleton-based action recognition by Joint tEmporal and cAmera viewpoiNt alIgnmEnt.
arXiv Detail & Related papers (2021-12-23T16:09:23Z) - CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects
from Point Clouds [97.63549045541296]
We propose a unified framework that can handle 9DoF pose tracking for novel rigid object instances and per-part pose tracking for articulated objects.
Our method achieves new state-of-the-art performance on category-level rigid object pose (NOCS-REAL275) and articulated object pose benchmarks (SAPIEN, BMVC) at the fastest FPS 12.
arXiv Detail & Related papers (2021-04-08T00:14:58Z) - Shelf-Supervised Mesh Prediction in the Wild [54.01373263260449]
We propose a learning-based approach to infer 3D shape and pose of object from a single image.
We first infer a volumetric representation in a canonical frame, along with the camera pose.
The coarse volumetric prediction is then converted to a mesh-based representation, which is further refined in the predicted camera frame.
arXiv Detail & Related papers (2021-02-11T18:57:10Z) - Dense Non-Rigid Structure from Motion: A Manifold Viewpoint [162.88686222340962]
Non-Rigid Structure-from-Motion (NRSfM) problem aims to recover 3D geometry of a deforming object from its 2D feature correspondences across multiple frames.
We show that our approach significantly improves accuracy, scalability, and robustness against noise.
arXiv Detail & Related papers (2020-06-15T09:15:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.